LNCS 2251 



I Yuan Y. Tang Victor Wickerhauser 
Pong C. Yuen Chun-hung Li (Eds.) 



Wavelet Analysis 
and Its Applications 

Second International Conference, WAA 2001 
Hong Kong, China, December 2001 
Proceedings 



Lecture Notes in Computer Science 225 1 

Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 




Preface 



The first international conference on wavelet analysis and its applications was 
held in China in 1999. Following the success of the first conference, the second in- 
ternational conference (ICWAA 2001) was held in Hong Kong in December 2001. 
The objective of this conference is to provide a forum for researchers working 
on both wavelet theory and its applications. By idea- sharing and discussions on 
the state of the art in wavelet theory and applications, ICWAA 2001 is aimed 
to stimulate the future development, explore novel applications, and exchange 
ideas for developing robust solutions. 

By August 2001, we had received 67 full papers submitted from all over 
the world. To ensure the quality of the conference and proceedings, each paper 
was reviewed by three reviewers. After a thorough review process, the program 
committee selected 24 regular papers for oral presentation and 27 short papers 
for poster presentation. In addition to these 24 oral presentations, there were 3 
invited talks delivered by distinguished researchers, namely Prof. John Daugman 
from Cambridge University, UK, Prof. Bruno Torresani from Inria, France, and 
Prof. Victor Wickerhauser, from Washington University, USA. We must add 
that the program committee and the reviewers did an excellent job within a 
tight schedule. 

We wish to thank all the authors for submitting their work to ICWAA 2001 
and all the participants, whether you came as a presenter or an attendee. We 
hope that there was ample time for discussion and opportunity to make new 
acquaintances. Finally, we hope that you experienced an interesting and exciting 
conference and enjoyed your stay in Hong Kong. 



October 2001 



Yuan Y. Tang, Victor Wickerhauser 
Pong C. Yuen, C. H. Li 




Berlin 

Heidelberg 

New York 

Barcelona 

Hong Kong 

London 

Milan 

Paris 

Tokyo 




Yuan Y. Tang Victor Wickerhauser 
Pong C. Yuen Chun-hung Li (Eds.) 



Wavelet Analysis 
and Its Applications 



Second International Conference, WAA 2001 
Hong Kong, China, December 18-20, 2001 
Proceedings 




Series Editors 



Gerhard Goos, Karlsruhe University, Germany 
Juris Hartmanis, Cornell University, NY, USA 
Jan van Leeuwen, Utrecht University, The Netherlands 

Volume Editors 

Yuan Y. Tang 
Pong C. Yuen 
Chun-hung Li 

Hong Kong Baptist University 
Department of Computer Science 

Kowloon Tong, Hong Kong E-mail: {yytang/pcyuen/chli}@comp.khbu.edu.hk 
Victor Wickerhauser 

Washington University, Department of Mathematics 
Campus Box 1146, Cupples I 
St. Louis, Missouri 63130, USA 
E-mail: victor@math.wustl.edu 



Cataloging-in-Publication Data applied for 

Die Deutsche Bibliothek - CIP-Einheitsaufnahme 

Wavelet analysis and its applications : second international conference ; 
proceedings / WAA 2001, Hong Kong, China, December 18-20, 2001. 

Yuan Y. Tang ... (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; 
London ; Milan ; Paris ; Tokyo : Springer, 2001 
(Lecture notes in computer science ; Vol. 2251) 

ISBN 3-540-43034-2 



CR Subject Classification (1998): E.4, H.5, 1.4, C.3, 1.5 
ISSN 0302-9743 

ISBN 3-540-43034-2 Springer- Verlag Berlin Heidelberg New York 



This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, 
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication 
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, 
in its current version, and permission for use must always be obtained from Springer- Verlag. Violations are 
liable for prosecution under the German Copyright Law. 

Springer- Verlag Berlin Heidelberg New York 
a member of BertelsmannSpringer Science+Business Media GmbH 

http ://www. springer, de 

© Springer-Verlag Berlin Heidelberg 2001 
Printed in Germany 

Typesetting: Camera-ready by author, data conversion by DA-TeX Gerd Blumenstein 
Printed on acid-free paper SPIN 10845973 06/3142 5 4 3 2 1 0 




Preface 



The first international conference on wavelet analysis and its applications was 
held in China in 1999. Following the success of the first conference, the second in- 
ternational conference (ICWAA 2001) was held in Hong Kong in December 2001. 
The objective of this conference is to provide a forum for researchers working 
on both wavelet theory and its applications. By idea- sharing and discussions on 
the state of the art in wavelet theory and applications, ICWAA 2001 is aimed 
to stimulate the future development, explore novel applications, and exchange 
ideas for developing robust solutions. 

By August 2001, we had received 67 full papers submitted from all over 
the world. To ensure the quality of the conference and proceedings, each paper 
was reviewed by three reviewers. After a thorough review process, the program 
committee selected 24 regular papers for oral presentation and 27 short papers 
for poster presentation. In addition to these 24 oral presentations, there were 3 
invited talks delivered by distinguished researchers, namely Prof. John Daugman 
from Cambridge University, UK, Prof. Bruno Torresani from Inria, France, and 
Prof. Victor Wickerhauser, from Washington University, USA. We must add 
that the program committee and the reviewers did an excellent job within a 
tight schedule. 

We wish to thank all the authors for submitting their work to ICWAA 2001 
and all the participants, whether you came as a presenter or an attendee. We 
hope that there was ample time for discussion and opportunity to make new 
acquaintances. Finally, we hope that you experienced an interesting and exciting 
conference and enjoyed your stay in Hong Kong. 



October 2001 



Yuan Y. Tang, Victor Wickerhauser 
Pong C. Yuen, C. H. Li 




Organization 



The Second International Conference on Wavelet Analysis and Applications is 
organized by the Department of Computer Science, Hong Kong Baptist Uni- 
veristy and IEEE Hong Kong Section Computer Chapter. 



Organizing Committee 



Congress Chair: 


Ernest C. M. Lam 


General Chairs: 


John Daugman 
Ernest C. M. Lam 


Program Chairs: 


Yuan Y. Tang 
Victor Wickerhauser 
P. C. Yuen 


Organizing Chair: 


Kelvin C. K. Wong 


Local Arrangement Chair: 


William K. W. Cheung 


Registration & Finance Chair: 


K. C. Tsui 


Publications Chairs: 


C. H. Li 
M. W. Mak 


Workshop Chair: 


Samuel P. M. Choi 


Publicity Chair: 


C. S. Huang 



Sponsors 



Hong Kong Baptist University 
Croucher Foundation 

IEEE Hong Kong Section Computer Chapter 




Organization VII 



Program Committee 




Metin Akay 


Dartmouth College 


Akram Aldroubi 


Vanderbilt University 


Claudia Angelini 


Istituto per Applicazioni della Matematica 


Algirdas Bastys 


Vilnius University 


T. D. Bui 


Concordia University 


Elvir Causevic 


Everest Biomedical Instrument Company 


Mariantonia Cotronei 


Universita’ di Messina 


Hans L. Cycon 


Fachhochschule fur Technik und Wirtschaft 
Berlin 


Dao-Qing Dai 


Zhongshan University 


Wolfgang Dahmen 


Technische Hochschule Aachen 


Donggao Deng 


Zhongshan University 


T. N. T. Goodman 


University of Dundee 


D. Hardin 


Vanderbilt University 


Daren Huang 


Zhongshan University 


Wen- Liang Hwang 


Institute of Information Science 


Rong-Qing Jia 


University of Alberta 


P. Jorgensen 


University of Iowa 


K. S. Lau 


Chinese University of Hong Kong 


Seng-Luan Lee 


National University of Singapore 


Jian-Ping Li 


Logistical Engineering University 


Wei Lin 


Zhongshan University 


Guixing Luan 


Shenyang Inst, of Computing Technology 


Hong Ma 


Sichuan University 


Peter Oswald 


Bell Laboratories, Lucent Technologies 


Lizhong Peng 


Peking University 


Valrie Perrier 


Domaine Universitaire 


S. D. Riemenschneider 


West Virgina University 


Zuowei Shen 


National University of Singapore 


Guoxiang Song 


XiDian University 


Georges Stamon 


University Rene Descartes 


Chew-Lim Tan 


National University of Singapore 


Michael Unser 


Batiment de Microtechnique 


Jianzhong Wang 


Sam Houston State University 


Yueshen Xu 


University of North Dakota 


Lihua Yang 


Zhongshan University 


Rongmao Zhang 


Shenyang Inst, of Computing Technology 


Xingwei Zhou 


Nankai University 




Table of Contents 



Keynote Presentations 

Personal Identification in Real-Time by Wavelet Analysis of Iris Patterns .... 1 
J. Daugman, OBE 

Hybrid Representations of Audiophonic Signals 2 

B. Torresani 

Singularity Detection from Autocovariance via Wavelet Packets 3 

M. V. Wickerhauser 

Image Compression and Coding 

Empirical Evaluation of Boundary Policies 

for Wavelet-Based Image Coding 4 

C. Schremmer 

Image-Feature Based Second Generation Watermarking 

in Wavelet Domain 16 

S. Guoxiang and W. Weiwei 

A Study on Preconditioning Multiwavelet Systems 

for Image Compression 22 

W. Kim and C.-C. Li 

Reduction of Blocking Artifacts in Both Spatial Domain 

and Transformed Domain 37 

W.-K. Ling and P. K.-S. Tam 

Simple and Fast Subband De-blocking Technique by Discarding 

the High Band Signals 44 

W.-K. Ling and P. K-S. Tam 

A Method with Scattered Data Spline 

and Wavelets for Image Compression 49 

L. Guan and L. Feng 



Video Coding and Processing 



A Wavelet-Based Preprocessing for Moving Object Segmentation 

in Video Sequences 

L.-C. Liu , J.-C. Chien, H. Y. Chuang, and C.-C. Li 



54 




X 



Table of Contents 



Embedded Zerotree Wavelet Coding of Image Sequence 65 

M. Jerome and N. Ellouze 

Wavelet-Based Video Compression Using 
Long-Term Memory Motion- Compensated Prediction 

and Context-Based Adaptive Arithmetic Coding 76 

D. Marpe, T. Wiegand, and H. L. Cycon 

Wavelets and Fractal Image Compression Based on Their Self-Similarity 

of the Space-Frequency Plane of Images 87 

Y. Ueno 

Theory 

Integration of Multivariate Haar Wavelet Series 99 

S. Heinrich , F. J. Hickernell , and R.-X. Yue 

An Application of Continuous Wavelet Transform 

in Differential Equations 107 

H.-Z. Qu, C. Xu, and Z. Ruizhen 

Stability of Biorthogonal Wavelet Bases in L, 2 (R) 117 

P. F. Curran and G. McDarby 

Characterization of Dirac Edge with New Wavelet Transform 129 

L. Yang, X. You, R. M. Haralick, I. T. Phillips, and Y. Y. Tang 

Wavelet Algorithm for the Numerical Solution 

of Plane Elasticity Problem 139 

Y. Shen and W. Lin 

Three Novel Models of Threshold Estimator for Wavelet Coefficients 145 

S. Guoxiang and Z. Ruizhen 

The PSD of the Wavelet- Packet Modulation 151 

M. Li, Q. Peng, and S. Zhong 

Orthogonal Multiwavelets with Dilation Factor a 157 

S. Yang, Z. Cheng, and H. Wang 

Image Processing 

A Wavelet-Based Image Indexing, Clustering, 

and Retrieval Technique Based on Edge Feature 164 

M. Kubo, Z. Aghbari, K. S. Oh, and A. Makinouchi 

Wavelet Applications in Segmentation of Handwriting 

in Archival Documents 176 

C. L. Tan, R. Cao, and P. Shen 




Table of Contents 



XI 



Wavelet Packets for Lighting-Effects Determination 188 

A. Z. Kouzani, and S. H. Ong 

Translation-Invariant Face Feature Estimation 

Using Discrete Wavelet Transform 200 

K. Ma and X. Tang 

Text Extraction Based on Nonlinear Frame 211 

Y. Guan and L. Zhang 

A Wavelet Multiresolution Edge Analysis Method 

for Recovery of Depth from Defocused Images 217 

Q. Wang, W. Hu, J. Hu, and K. Hu 

Construction of Finite Non-separable Orthogonal Filter Banks 

with Linear Phase and Its Application in Image Segmentation 223 

H. Chen and S. Peng 

Mixture- State Document Segmentation 

Using Wavelet- Domain Hidden Markov Tree Models 230 

Y. Y. Tang, Y. Hou, J. Song, and X. Yang 

Some Experiment Results on Feature Analyses 
of Stroke Sequence Free Matching Algorithms 

for On-Line Chinese Character Recognition 237 

M. L. Tak 

Automatic Detection Algorithm of Connected Segments 

for On-line Chinese Character Recognition 242 

M. L. Tak 

Signal Processing 

Speech Signal Deconvolution Using Wavelet Filter Banks 248 

W. Hu and R. Linggard 

A Proposal of Jitter Analysis Based on a Wavelet Transform 257 

J. Borgosz and B. Cyganek 

Skewness of Gabor Wavelets and Source Signal Separation 269 

W. Yu, G. Sommer, and K. Daniilidis 

The Application of the Wavelet Transform 

to Polysomnographic Signals 284 

M. MacCallum and A. E. A. Almaini 

Wavelet Transform Method of Waveform Estimation 

for Hilbert Transform of Fractional Stochastic Signals with Noise 296 

W. Su, H. Ma, Y. Y. Tang, and M. Umeda 




XII 



Table of Contents 



Multiscale Kalman Filtering of Fractal Signals 

Using Wavelet Transform 305 

J. Zhao , H. Ma, Z.-S. You, and M. Umeda 

General Analytic Construction for Wavelet Low-Passed Filters 314 

J. P. Li and Y. Y. Tang 

A Design of Automatic Speech Playing System Based 

on Wavelet Transform 321 

Y. Liu, J. Cen, Q. Sun, and L. Yang 

General Design of Wavelet High-Pass Filters 

from Reconstructional Symbol 326 

L. Yang, Q. Chen, and Y. Y. Tang 

Realization of Perfect Reconstruction Non-uniform Filter Banks 

via a Tree Structure 331 

W.-K. Ling and P. K.-S. Tam 

Set of Decimators for Tree Structure Filter Banks 336 

W.-K. Ling and P. K.-S. Tam 

Set of Perfect Reconstruction Non-uniform Filter Banks 

via a Tree Structure 341 

W.-K. Ling and P. K.-S. Tam 



Systems and Applications 



Joint Time-Frequency Distributions for Business Cycle Analysis 347 

S. Md. Raihan, Y. Wen, and B. Zeng 

The Design of Discrete Wavelet Transformation Chip 359 

Z. Razak and M. Yaacob 

On the Performance of Informative Wavelets 

for Classification and Diagnosis of Machine Faults 369 

H. Ahmadi, R. Tafreshi, F. Sassani, and G. Dumont 

A Wavelet-Based Ammunition Doppler Radar System 382 

S. H. Ong and A. Z. Kouzani 

The Application of Wavelet Analysis Method 

to Civil Infrastructure Health Monitoring 393 

J. P. Li, S. A. Yan, and Y. Y. Tang 

Piecewise Periodized Wavelet Transform and 

Its Realization, Properties and Applications 398 

W.-K. Ling and P. K.-S. Tam 

Wavelet Transform and Its Application to Decomposition 

of Gravity Anomalies 404 

H. Zunze 




Table of Contents XIII 



Computations of Inverse Problem by Using Wavelet in Multi-layer Soil ... .411 
B. Wu, S. Liu, and Z. Deng 

Wavelets Approach in Choosing Adaptive Regularization Parameter 418 

F. Lu, Z. Yang, and Y. Li 

DNA Sequences Classification Based on Wavelet Packet Analysis 424 

J. Zhao, X. W. Yang, J. P. Li, and Y. Y. Tang 

The Application of the Wavelet Transform 

to the Prediction of Gas Zones 430 

X. W. Yang, J. Zhao, J. P. Li, J. Liu, and S. P. Zeng 

Parameterizations of M-Band Biorthogonal Wavelets 435 

Z. Zhang and D. Huang 

Author Index 449 




Personal Identification in Real-Time by Wavelet 
Analysis of Iris Patterns 



John Daugman, OBE 

The Computer Laboratory, University of Cambridge, UK 



Abstract. The central issue in pattern recognition is the relation be- 
tween within-class variability and between-class variability. These are 
determined by the various degrees-of-freedom spanned by the patterns 
themselves, and by the selectivity of the chosen feature encoders. An in- 
teresting application of 2D wavelets in computer vision is the automatic 
recognition of personal identity by encoding and matching the complex 
patterns visible at a distance in each eye’s iris. Because the iris is a 
protected, internal, organ whose random texture is highly unique and 
stable over life, it can serve as a kind of living password or passport that 
one need not remember but is always in one’s possession. I will describe 
wavelet demodulation methods that I have developed for this problem 
over the past 10 years, and which are now installed in all existing commer- 
cial systems for iris recognition. The principle that underlies iris recogni- 
tion is the failure of a test of statistical independence performed on the 
phase angle sequences of iris patterns. Quadrature 2D Gabor wavelets 
spanning 3 octaves in scale enable the complex- valued assignment of lo- 
cal phasor coordinates to iris patterns. The combinatorial complexity of 
these phase sequences spans about 244 independent degrees-of-freedom, 
and generates binomial distributions for the Hamming Distances (a sim- 
ilarity metric) between different irises. In six public independent field 
trials conducted so far using these algorithms, involving several millions 
of iris comparisons, there has never been a single false match recorded. 
The time required to locate and to encode an iris into quantized wavelet 
phase sequences is 1 second. Then database searches are performed at a 
rate of 100,000 irises/second. Data will be presented in this talk from 2.3 
million IrisCode comparisons. This wavelet application could be used in 
a wide range of settings in which persons’ identities must be established 
or confirmed by large scale database search, without relying upon cards, 
keys, documents, secrets, passwords or PINs. 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, p. 1, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 




Hybrid Representations of Audiophonic Signals 



Bruno Torresani 

LATP, CMI, Universite de Provence, France 



Abstract. A new approach for modeling audio signal will be presented, 
in view of efficient encoding. The method is based upon hybrid mod- 
els featuring transient, tonal and stochastic components in the signal. 
The three components are estimated and encoded independently using 
a strategy very much in the spirit of transform coding. The signal mod- 
els involve nonlinear expansions on local trigonometric bases, and bi- 
nary trees of wavelet coefficients. Unlike several existing approaches, the 
method does not rely on any prior segmentation of the signal. The talk 
is based on joint works with L. Daudet and S. Molla. 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, p. 2, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 




Singularity Detection from Autocovariance via 
Wavelet Packets 



M. Victor Wickerhauser 

Department of Mathematics, Washington University, USA 



Abstract. We use the eigenvalues of a version of the autocovariance ma- 
trix to recognize directions at which the Fourier transform of a function 
is slowly decreasing, which provides us with a technique to detect sin- 
gularities in images. In very high dimensions, we show how the wavelet 
packet best-basis algorithm can be used to compute these eigenvalues 
approximately, at relatively low computational complexity. 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, p. 3, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 




Empirical Evaluation of Boundary Policies for 
Wavelet-Based Image Coding 



Claudia Schremmer 
Praktische Informat ik IV 

Universitat Mannheim, 68131 Mannheim, Germany 

schremmer® inf ormatik . uni-mannheim . de 



Abstract. The wavelet transform has become the most interesting new 
algorithm for still image compression. Yet there are many parameters 
within a wavelet analysis and synthesis which govern the quality of a 
decoded image. In this paper, we discuss different image boundary poli- 
cies and their implications for the decoded image. A pool of gray-scale 
images has been wavelet-transformed at different settings of the wavelet 
filter bank and quantization threshold and with three possible boundary 
policies. 

Our empirical evaluation is based on three benchmarks: a first judgment 
regards the perceived quality of the decoded image. The compression 
rate is a second crucial factor. Finally, the best parameter settings with 
regard to these two factors is weighted with the cost of implementation. 
Contrary to the JPEG2000 standard, where mirror padding is imple- 
mented, our investigation proposes circular convolution as the boundary 
treatment. 

Keywords: Wavelet Analysis, Boundary Policies, Empirical Evaluation 



1 Introduction 

Due to its outstanding performance in compression, the wavelet transform is 
the focus of new image coding techniques such as the JPEG2000 standard [8,4]. 
JPEG2000 proposes a reversible (Daub 5/3-tap) and an irreversible (Daub 9/7- 
tap) wavelet filter bank. However, since we were interested in how filter length 
affects the quality of image coding, we investigated the orthogonal and sepa- 
rable wavelet filters developed by Daubechies [2]. These belong to the group 
of wavelets used most often in image coding applications. They specify a num- 
ber no of vanishing moments: if a wavelet has no vanishing moments, then the 
approximation order of the wavelet transform is also no- 

Implementations of the wavelet transform on still images entail other aspects 
as well: speed, decomposition depth, and boundary treatment policies. Long fil- 
ters require more computing time than short ones. Furthermore, the (dyadic) 
wavelet transform incorporates the aspect of iteration: the low-pass filter de- 
fines an approximation of the original signal that contains only half as many 
coefficients. This approximation successively builds the input for the next ap- 
proximation. For compression purposes, coefficients in the time-scale domain 

Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 4 15, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



Empirical Evaluation of Boundary Policies 



5 



are discarded and the synthesis quality improves with the number of iterations 
on the approximation. Finally, the wavelet transform is mathematically defined 
only within a signal; image applications thus need to solve the boundary prob- 
lem. Depending on the boundary policy selected, the number of iterations in a 
wavelet transform might vary with the filter length. Moreover, the longer the 
filter length, the more important the boundary policy becomes. 

In this work, we investigate the effects of three different boundary policies 
in combination with different wavelet filter banks on a number of gray-scale 
images. A first determining factor is the visual perception of a decoded image. 
As we will see, although the quality varies strongly with the selected image, for 
a given image it remains relatively unconcerned about the parameter settings. 
A second crucial factor is therefore the expected compression rate. Finally, the 
cost of implementation weights these two benchmarks. Our empirical evaluation 
leads us to recommend circular convolution as the boundary treatment, contrary 
to JPEG2000 which proposes padding. 

The article is organized as follows. In Section 2, we cite related work on 
wavelet filter evaluation. Section 3 reviews the wavelet transform and details the 
aspects that are important for our survey. In Section 4, we present the technical 
evaluation of the wavelet transform and detail our results. The article ends in 
Section 5 with an outlook on future work. 

2 Related Work 

Villasenor’s group researches wavelet filters for image compression. In [10], the 
focus is on biorthogonal filters, and the evaluation is based on the information 
preserved in the reference signal, while [3] focuses on a mathematically opti- 
mal quantizer step size. In [1], the evaluation is based on lossless as well as on 
subjective lossy compression performance, complexity and memory usage. An 
interpretation of why the observations are made is nevertheless lacking. Strutz 
has thoroughly researched the dyadic wavelet transform in [9]: the design and 
construction of different wavelet filters is investigated, as are good Huffman and 
arithmetic encoding strategies. An investigation of boundary policies, however, 
is lacking. 

3 The Wavelet Transform 

A wavelet is an (ideally) compact function, i.e., outside a certain interval it van- 
ishes. Implementations are based on the fast wavelet transform, where a given 
wavelet (i.e., mother wavelet ) is shifted and dilated so as to provide a base in the 
function space. That is, a one-dimensional function is transformed into a two- 
dimensional space, where it is approximated by coefficients that depend on time 
(determined by the translation parameter) and on scale , i.e., frequency (deter- 
mined by the dilation parameter). The localization of a wavelet in time spread 
(cq) and frequency spread (cq,) has the property cqcq, = const. However, the 
resolution in time and frequency depends on the frequency. This is the so-called 



6 



Claudia Schremmer 



zoom phenomenon of the wavelet transform: it offers high temporal localization 
for high frequencies while offering good frequency resolution for low frequencies. 

3.1 Wavelet Transform and Filter Banks 

By introducing multiresolution, Mallat [7] made an important contribution to 
the application of wavelet theory to multimedia: the transition from mathe- 
matical theory to filters. Multiresolution analysis is implemented via high-pass, 
respectively, band-pass filters (i.e., wavelets) and low-pass filters (i.e., scaling 
functions): The detail coefficients (resulting from the high-pass, respectively, 
band-pass filtering) of every iteration step are kept apart, and the iteration 
starts again with the remaining approximation coefficients (from application 
of the low-pass filter). This multiresolution theory is ‘per se’ defined only for 
one-dimensional wavelets on one-dimensional signals. As still images are two- 
dimensional discrete signals and two-dimensional wavelet filter design remains 
an active field of research [5] [6], current implementations are restricted to sep- 
arable filters. The successive convolution of filter and signal in both dimensions 
opens two potential iterations: 

— standard : all approximations, even in mixed terms, are iterated, and 

— non-standard : only the purely low-pass filtered parts of every approximation 
enter the iteration. 

In this work, we concentrate on the non-standard decomposition. 



3.2 Image Boundary 

A digital filter is applied to a signal by convolution. Convolution, however, is 
defined only within a signal. In order to result in a reversible wavelet trans- 
form, each signal coefficient must enter into f ilter_length/2 calculations of 
convolution (here, the subsampling process by factor 2 is already incorporated). 
Consequently, every filter longer than two entries, i.e., every filter except Haar , 
requires a solution for the boundary. Furthermore, images are signals of a rela- 
tively short length (in rows and columns), thus the boundary treatment is even 
more important than e.g. in audio coding. Two common boundary policies are 
padding and circular convolution. 



Padding Policies. With padding, the coefficients of the signal on either border 
are padded with f ilter_length-2 coefficients. Consequently, each signal coeffi- 
cient enters into f ilter_length/2 calculations of convolution, and the transform 
is reversible. Many padding policies exist; they all have in common that each it- 
eration step physically increases the storage space in the wavelet domain. In [11], 
a theoretical solution for the required storage space (depending on the signal, the 
filter bank and the iteration level) is presented. Nevertheless, its implementation 
remains sophisticated. 



Empirical Evaluation of Boundary Policies 



7 



Circular Convolution. The idea of circular convolution is to ‘wrap’ the end 
of a signal to its beginning or vice versa. In so doing, circular convolution is the 
only boundary treatment to maintain the number of coefficients for a wavelet 
transform, thus simplifying storage management 1 . A minor drawback is that the 
time information contained in the time-scale domain of the wavelet-transformed 
coefficients ‘blurs’: the coefficients in the time-scale domain that are next to the 
right border (respectively, left border) also affect signal coefficients that are 
located on the left (respectively, right). 

The selected boundary policy has an important impact on the iteration be- 
havior of the wavelet transform. It does not affect the iteration behavior of 
padding policies. However, with circular convolution, the decomposition depth 
varies with the filter length: the longer the filter, the fewer the number of decom- 
position iterations possible. For example, for an image of 256 x 256 pixels, the 
Daub-2 filter bank with 4 coefficients allows a decomposition depth of 7, while 
the Daub-20 filter bank with 40 coefficients has reached signal length after only 3 
decomposition levels. 

Thus, the evaluation presented in Tables 1 to 4 is based on a decomposition 
depth of level 8 for the two padding policies, while the decomposition depth for 
circular convolution varies from 7 to 3, according to the selected filter length. 

4 Empirical Evaluation 

4.1 Set-Up 

Our empirical evaluation sought the best parameter settings for the choice of 
the wavelet filter bank and for the image boundary policy to be implemented. 
The performance was evaluated according to the criteria: 

1. visual quality, 

2. compression rate, and 

3. complexity of implementation. 

The quality was rated based on the peak signal-to-noise ratio (PSNR) 2 . The 
compression rate was simulated by a simple quantization threshold: the higher 
the threshold, the more coefficients in the time-scale domain are discarded, the 
higher is the compression rate. More precisely, the threshold was carried out 
only on the parts of the image that have been high-pass filtered (respectively, 
band-pass filtered) at least once. That is, the approximation of the image was 
excluded from the thresholding due to its importance for the image synthesis. 

1 Storage space, however, expands indirectly: an image can be stored with integers, 
while the coefficients in the time-scale domain require floats. 

2 When org(x,y) depicts the pixel value of the original image at position (x,y), and 
dec (x,y) denotes the pixel value of the decoded image at position ( x,y ), then 

/ \ 1 255 ^ \ 

PSNR [dBl = 10 • log ( ^ ^ , ) . 

1 J \S* v (or S (x ’»)- de c(z>!/)) 2 / 



Claudia Schremmer 



Our evaluation was set up on the six gray-scale images of size 256 x 256 
pixels demonstrated in Figure 1. These test images have been chosen in order to 
comply with different features: 

— contain many small details: Mandrill , Goldhill , 

— contain large uniform areas: Brain , Lena, Camera , House , 

— be relatively symmetric at the left-right and top-bottom boundaries: Man- 
drill , Brain , 

— be very asymmetric with regard to these boundaries: Lena , Goldhill , House , 

— have sharp transitions between regions: Brain , Lena, Camera , House , and 

— contain large areas of texture: Mandrill , Lena, Goldhill , House. 



4.2 Results 

Image-Dependent Analysis. The detailed evaluation results for the six test 
images are presented in Tables 1 and 2. Some interesting observations made from 
these two tables and their explanations are as follows: 

— For a given image and a given quantization threshold, the PSNR remains 
astonishingly constant for different filter banks and different boundary poli- 
cies. 

— At high thresholds, Mandrill and Goldhill yield the worst quality. This is due 
to the large amount of details in both images. 

— House produces the overall best quality at a given threshold. This is due to 
its large uniform areas. 

— Due to their symmetry, Mandrill and Brain show good quality results with 
padding policies. 

— The percentage of discarded information at a given threshold is far higher 
for Brain than for Mandrill. This is due to the uniform black background of 
Brain , which produces small coefficients in the time-scale domain, compared 
to the many small details in Mandrill which produce large coefficients and 
thus do not fall below the threshold. 

— With regard to the heuristic for compression, and for a given image and 
boundary policy, Table 2 reveals that 

• the compression ratio for zero padding increases with increasing filter 
length, 

• the compression ratio for mirror padding decreases with increasing filter 
length, and 

• the compression ratio for circular convolution varies, but most often stays 
almost constant. 

The explanation is as follows. Padding an image with zeros, i.e., black pixel 
values, most often produces a sharp contrast to the original image, thus the 
sharp transition between the signal and the padding coefficients results in 
large coefficients in the fine scales, while the coarse scales remain unaffected. 
This observation, however, is put into a different perspective for longer filters: 
With longer filters, the constant run of zeros at the boundary does not show 



Empirical Evaluation of Boundary Policies 



9 



strong variations, and the detail coefficients in the time-scale domain thus 
remain small. Hence, a given threshold cuts off fewer coefficients when the 
filter is longer. With mirror padding, the padded coefficients for shorter filters 
represent a good heuristic for the signal adjacent to the boundary. Increasing 
filter length and accordingly, longer padded areas, however, introduces too 
much ‘false’ detail information into the signal, resulting in many large detail 
coefficients that ‘survive’ the threshold. 



Image-Independent Analysis. The above examples reveal that most phe- 
nomena are signal-dependent. As a signal-dependent determination of best- 
suited parameters remains academic, our further reflections are made on the 
average image quality and the average amount of discarded information as pre- 
sented in Tables 3 and 4 and the corresponding Figures 2 and 3. 

Figure 2 visualizes the coding quality of the images, averaged over the six 
test images. The four plots represent the quantization thresholds A = 10,20,45 
and 85. In each graphic, the visual quality (quantified via PSNR) is plotted 
against the filter length of the Daubechies wavelet filters. The three boundary 
policies: zero padding , mirror padding and circular convolution are regarded sep- 
arately. The plots obviously reveal that the quality decreases with an increasing 
threshold. More important are the following statements: 

— Within a given threshold, and for a given boundary policy, the PSNR remains 
almost constant. This means that the quality of the coding process depends 
hardly or not at all on the selected wavelet filter bank. 

— Within a given threshold, mirror padding produces the best results, followed 
by circular convolution. Zero padding performs worst. 

— The gap between the performance of the boundary policies increases with 
an increasing threshold. 

Nevertheless, the differences observed above with 0.28 dB maximum gap (at the 
threshold A = 85 and the filter length of 40 coefficients) are so marginal that 
they do not actually influence visual perception. 

As the visual perception is neither influenced by the choice of filter nor by the 
boundary policy, the coding performance has been studied as a second bench- 
mark. The following observations are made in Figure 3. With a short filter length 
(4 to 10 coefficients), the compression ratio is almost identical for the differ- 
ent boundary policies. This is not astonishing as short filters involve only little 
boundary treatment, and the relative importance of the boundary coefficients 
with regard to the signal coefficients is negligible. More important for our inves- 
tigation is that: 

— The compression heuristic for each of the three boundary policies is inversely 
proportional to their quality performance. In other words, mirror padding 
discards the least number of coefficients at a given quantization threshold, 
while zero padding discards the most. 



10 



Claudia Schremmer 



— With an increasing threshold, the gap between the compression ratios of the 
three policies narrows. 

In the overall evaluation, we have seen that mirror padding performs best with 
regard to quality, while it performs worst with regard to compression. Inversely, 
zero padding performs best with regard to compression and worst with regard 
to quality. Circular convolution holds the midway in both aspects. On the other 
hand, the gap in compression is by far superior to the differences in quality. 
Calling to mind the coding complexity of the padding approaches, compared to 
the easy implementation of circular convolution (see Section 3.2), we strongly 
recommend to implement circular convolution as the boundary policy in image 
coding. 

5 Conclusion 

We have discussed and evaluated the strengths and weaknesses of different 
boundary policies in relation to various orthogonal wavelet filter banks. Con- 
trary to the JPEG2000 coding standard, where mirror padding is suggested for 
boundary treatment, we have proven that circular convolution is superior in the 
overall combination of quality performance, compression performance and ease 
of implementation. 

In future work, we will improve our heuristic on the compression rate and 
rely on the calculation of a signal’s entropy such as it is presented in [12] and [9]. 

References 

1. Michael D. Adams and Faouzi Kossentini. Performance Evaluation of Reversible 
Integer-to-Integer Wavelet Transforms for Image Compression. In Proc. IEEE 
Data Compression Conference , page 514 ff., Snowbird, Utah, March 1999. 5 

2. Ingrid Daubechies. Ten Lectures on Wavelets , volume 61. SIAM. Society for 
Industrial and Applied Mathematics, Philadelphia, PA, 1992. 4 

3. Javier Garcia-Frias, Dan Benyamin, and John D. Villasenor. Rate Distortion Opti- 
mal Parameter Choice in a Wavelet Image Communication System. In Proc. IEEE 
International Conference on Image Processing , pages 25-28, Santa Barbara, CA, 
October 1997. 5 

4. ITU. JPEG2000 Image Coding System. Final Committee Draft Version 1.0 - 
FCD 15444-1. International Telecommunication Union, March 2000. 4 

5. Jelena Kovacevic and Wim Sweldens. Wavelet Families of Increasing Order in 
Arbitrary Dimensions. IEEE Trans, on Image Processing , 9(3):480-496, March 
2000. 6 

6. Jelena Kovacevic and Martin Vetterli. Nonseparable Two- and Three-Dimensional 
Wavelets. IEEE Trans, on Signal Processing , 43(5): 1269-1273, May 1995. 6 

7. Stephane Mallat. A Wavelet Tour of Signal Processing. Academic Press, San 
Diego, CA, 1998. 6 

8. Athanassios N. Skodras, Charilaos A. Christopoulos, and Touradj Ebrahimi. 
JPEG2000: The Upcoming Still Image Compression Standard. In 11th Portuguese 
Conference on Pattern Recognition , pages 359-366, Porto, Portugal, May 2000. 4 



Empirical Evaluation of Boundary Policies 



11 



9. Tilo Strutz. Untersuchungen zur skalierbaren Kompression von Bildsequenzen 
bei niedrigen Bitraten unter Verwendung der dyadischen Wavelet-Transformation. 
PhD thesis, Universitat Rostock, Germany, May 1997. 5, 10 

10. John D. Villasenor, Benjamin Belzer, and Judy Liao. Wavelet Filter Evaluation 
for Image Compression. IEEE Trans, on Image Processing , 2:1053-1060, August 
1995. 5 

11. Mladen Victor Wickerhauser. Adapted Wavelet Analysis from Theory to Software. 
A. K. Peters Ltd., Natick, MA, 1998. 6 

12. Mathias Wien and Claudia Meyer. Adaptive Block Transform for Hybrid Video 
Coding. In Proc. SPIE Visual Communications and Image Processing , pages 153- 
162, San Jose, CA, January 2001. 10 




(d) Camera 



(e) Goldhill 



(f) House 



Fig. 1 . Test images for the evaluation 








12 



Claudia Schremmer 



Table 1 . Detailed results of the quality evaluation with the PSNR on the six 
test images. The mean values over the images are given in Table 3 



Quality of visual perception — PSNR [dB] 



Wavelet 


zero 

padding 


mirror 

padding 


circular 

convol. 


zero 

padding 


mirror 

padding 


circular 

convol. 


zero 

padding 


mirror 

padding 


circular 

convol. 




Mandrill || Brain || Lena 




Threshold: 10 — Excellent overal' 


quality 


Daub— 2 
Daub— 3 
Daub— 4 
Daub— 5 
Daub— 10 
Daub-15 
Daub-20 


18.012 

18.157 

18.169 

18.173 

17.977 

17.938 

17.721 


17.996 

18.187 

18.208 

18.167 

17.959 

17.934 

17.831 


18.238 

18.221 

17.963 

18.186 

18.009 

18.022 

18.026 


18.141 

18.429 

18.353 

18.279 

18.291 

18.553 

18.375 


18.151 

18.434 

18.340 

18.280 

18.300 

18.543 

18.357 


18.197 

18.433 

18.248 

18.259 

18.479 

18.523 

18.466 


16.392 

16.391 

16.294 

16.543 

16.249 

16.267 

16.252 


16.288 

16.402 

16.355 

16.561 

16.278 

16.304 

16.470 


16.380 

16.350 

16.260 

16.527 

16.214 

16.288 

16.238 




Thres 


lold: 20 — Good overall quality 


Daub— 2 
Daub— 3 
Daub— 4 
Daub— 5 
Daub— 10 
Daub-15 
Daub— 20 


14.298 

14.414 

14.231 

14.257 

14.268 

14.246 

14.046 


14.350 

14.469 

14.239 

14.216 

14.274 

14.258 

14.065 


14.403 

14.424 

14.276 

14.269 

14.360 

14.300 

14.227 


16.610 

16.743 

16.637 

16.747 

16.801 

16.822 

16.953 


16.611 

16.755 

16.628 

16.751 

16.803 

16.810 

16.980 


16.577 

16.721 

16.734 

16.854 

16.878 

16.852 

16.769 


14.775 

14.758 

14.862 

14.739 

14.624 

14.395 

14.252 


14.765 

14.817 

14.918 

14.946 

14.840 

14.631 

14.597 


14.730 

14.687 

14.735 

14.815 

14.699 

14.477 

14.353 




Threshold: 45 — Medium overall quality 


Daub— 2 
Daub— 3 
Daub— 4 
Daub— 5 
Daub— 10 
Daub-15 
Daub— 20 


10.905 

10.988 

10.845 

10.918 

10.907 

10.845 

10.784 


10.885 

10.970 

10.839 

10.969 

10.929 

10.819 

10.872 


10.910 

10.948 
10.885 

10.949 
10.913 
10.815 
10.843 


14.815 

15.187 

15.014 

15.036 

14.989 

15.093 

14.975 


14.816 

15.150 

15.029 

15.031 

15.013 

15.133 

14.934 


14.747 

15.052 

15.056 

14.999 

15.212 

15.064 

14.882 


13.010 

12.766 

12.820 

12.913 

12.447 

12.577 

12.299 


13.052 

13.138 

13.132 

13.301 

13.066 

12.954 

12.877 


12.832 

12.903 

12.818 

12.983 

12.795 

12.686 

12.640 




Threshold: 85 — Poor overall quality 


Daub— 2 
Daub— 3 
Daub— 4 
Daub— 5 
Daub— 10 
Daub-15 
Daub-20 


9.095 

9.206 
9.160 
9.171 

9.207 
9.083 
9.071 


9.121 

9.184 

9.152 

9.208 

9.193 

9.161 

9.142 


9.135 

9.124 

9.168 

9.203 
9.206 
9.126 

9.204 


13.615 

13.787 

13.792 

13.837 

13.870 

13.731 

13.852 


13.621 

13.784 

13.815 

13.850 

13.922 

13.795 

13.800 


13.783 

13.759 

13.808 

13.705 

14.042 

13.917 

13.974 


11.587 

11.437 

11.539 

11.692 

11.128 

11.128 

11.142 


11.902 

11.793 

11.806 

11.790 

11.430 

11.610 

11.694 


11.577 

11.516 

11.636 

11.872 

11.555 

11.475 

11.597 




Camera || Goldhill 


House 




Threshold: 10 — Excellent overal 


quality 


Daub— 2 
Daub— 3 
Daub— 4 
Daub— 5 
Daub— 10 
Daub— 15 
Daub-20 


17.334 

17.532 

17.529 

17.489 

17.539 

17.747 

17.474 


17.346 

17.560 

17.591 

17.448 

17.541 

17.530 

17.527 


17.371 

17.625 

17.577 

17.389 

17.383 

17.523 

17.484 


16.324 

16.322 

16.241 

16.214 

16.307 

16.012 

16.322 


16.266 

16.296 

16.212 

16.193 

16.223 

16.067 

16.245 


16.412 

16.358 

16.342 

16.154 

16.317 

16.033 

16.319 


19.575 

19.640 

19.560 

19.613 

19.482 

19.653 

19.550 


19.563 

19.630 

19.558 

19.555 

19.388 

19.671 

19.495 


19.608 

19.621 

19.584 

19.566 

19.732 

19.726 

19.524 




Thres 


lold: 20 — Good overall quality 


Daub— 2 
Daub— 3 
Daub— 4 
Daub— 5 
Daub— 10 
Daub-15 
Daub— 20 


14.387 

14.473 

14.438 

14.460 

14.468 

14.408 

14.384 


14.365 

14.452 

14.438 

14.505 

14.400 

14.406 

14.370 


14.396 

14.426 
14.430 

14.427 
14.409 
14.414 
14.362 


13.937 

13.872 

13.828 

13.743 

13.762 

13.687 

13.700 


13.940 

13.892 

13.836 

13.743 

13.785 

13.730 

13.782 


13.898 

13.858 

13.753 

13.711 

13.798 

13.697 

13.731 


17.446 

17.525 

17.468 

17.454 

17.592 

17.260 

17.476 


17.480 

17.594 

17.647 

17.458 

17.635 

17.276 

17.449 


17.471 

17.612 

17.351 

17.465 

17.689 

17.266 

17.240 




Threshold: 45 — Medium overall quality 


Daub— 2 
Daub— 3 
Daub— 4 
Daub— 5 
Daub— 10 
Daub— 15 
Daub— 20 


12.213 

12.032 

12.150 

12.077 

12.061 

12.074 

11.798 


12.242 

12.122 

12.178 

12.133 

12.197 

12.059 

11.975 


12.131 

12.188 

12.145 

12.120 

12.093 

12.176 

12.048 


12.033 

11.961 

11.855 

11.848 

11.760 

11.725 

11.763 


12.034 

12.006 

11.891 

11.844 

11.917 

11.855 

11.803 


11.876 

11.889 

11.925 

11.801 

11.726 

11.753 

11.703 


15.365 

14.957 

14.906 

15.159 

14.776 

14.810 

14.420 


15.437 

15.476 

15.080 

15.382 

15.246 

15.090 

15.033 


15.155 

15.118 

15.180 

15.244 

14.872 

14.969 

14.609 




Threshold: 85 — Poor overall quality 


Daub— 2 
Daub— 3 
Daub— 4 
Daub— 5 
Daub— 10 
Daub— 15 
Daub-20 


11.035 

11.092 

10.943 

11.018 

10.815 

10.779 

10.688 


11.161 

11.176 

11.152 

11.148 

11.064 

11.005 

11.031 


11.041 

11.080 

11.046 

11.129 

10.987 

10.982 

11.090 


10.791 

10.943 

10.861 

10.826 

10.824 

10.737 

10.709 


10.805 

10.916 

10.904 

10.935 

10.972 

10.838 

10.819 


10.844 

10.754 

10.740 

10.738 

10.771 

10.607 

10.766 


13.530 

13.488 

13.524 

13.114 

13.158 

13.073 

13.173 


13.804 

13.726 

13.613 

13.903 

13.695 

13.357 

13.257 


13.703 

13.627 

13.510 

13.111 

13.434 

13.123 

13.678 



Empirical Evaluation of Boundary Policies 



13 



Table 2. Heuristic for the compression rate of the coding parameters of Table 1: 
The higher the percentage of discarded information in the time-scale domain is, 
the higher is the compression ratio. The mean values over the images are given 
in Table 4 



Discarded information in the time— scale domain — Percentage [%] 





Wavelet 


zero 

padding 


mirror 

padding 


circular 

convol. 


zero 

padding 


mirror 

padding 


circular 

convol. 


zero 

padding 


mirror 

padding 


circular 

convol. 








Mandrill 


1 


Brain 


l 


J 


Lena 










Threshold 


: A = 10 


— Excellent overall quality 








Daub— 2 


42 


41 


41 


83 


83 


83 


78 


79 


79 






Daub— 3 


43 


42 


42 


84 


84 


84 


78 


80 


80 






Daub— 4 


44 


42 


41 


85 


84 


84 


78 


79 


79 






Daub— 5 


45 


41 


41 


85 


84 


84 


79 


79 


80 






Daub— 10 


53 


38 


41 


87 


82 


84 


79 


74 


78 






Daub— 15 


59 


35 


40 


88 


78 


82 


82 


69 


77 






Daub— 20 


65 


32 


40 


89 


74 


83 


83 


64 


77 








Thresho 


d: A = 20 — Good overal 


quality 






Daub— 2 


63 


63 


63 


91 


91 


91 


87 


89 


88 






Daub— 3 


64 


63 


64 


92 


91 


91 


87 


89 


89 






Daub— 4 


65 


63 


63 


92 


91 


91 


87 


88 


89 






Daub— 5 


66 


62 


63 


92 


91 


91 


87 


90 


89 






Daub— 10 


70 


58 


63 


93 


89 


91 


88 


83 


88 






Daub-15 


74 


56 


62 


93 


86 


91 


89 


79 


88 






Daub— 20 


78 


51 


63 


94 


82 


91 


90 


74 


88 










Threshoh 


1: A = 45 


— Medium overa 


1 quality 






Daub— 2 


86 


86 


87 


96 


96 


96 


94 


95 


95 






Daub— 3 


86 


86 


87 


96 


96 


96 


94 


95 


95 






Daub— 4 


87 


86 


87 


96 


96 


96 


94 


95 


96 






Daub— 5 


87 


85 


87 


96 


96 


96 


95 


94 


96 






Daub— 10 


88 


82 


87 


97 


94 


96 


94 


91 


96 






Daub-15 


90 


79 


87 


97 


91 


96 


95 


88 


96 






Daub— 20 


92 


74 


87 


97 


89 


96 


96 


83 


96 








Threshold: A = 85 — Poor overall quality 






Daub— 2 


96 


96 


97 


98 


98 


98 


97 


98 


98 






Daub— 3 


96 


96 


97 


98 


98 


98 


97 


98 


98 






Daub— 4 


96 


96 


97 


98 


98 


98 


97 


97 


98 






Daub— 5 


96 


95 


97 


98 


98 


98 


98 


97 


98 






Daub— 10 


97 


93 


97 


98 


97 


98 


97 


94 


98 






Daub— 15 


97 


91 


97 


98 


95 


98 


98 


92 


98 






Daub— 20 


97 


86 


98 


98 


93 


99 


98 


88 


99 


















Camera 


1 


Goldhill 


1 


1 


House 










Threshold 


: A = 10 


— Excellent overall quality 








Daub— 2 


78 


80 


79 


70 


71 


70 


79 


80 


80 






Daub-3 


77 


79 


78 


70 


71 


71 


79 


80 


80 






Daub— 4 


77 


79 


78 


71 


71 


70 


79 


80 


79 






Daub— 5 


77 


78 


78 


71 


71 


70 


79 


79 


79 






Daub— 10 


77 


74 


76 


73 


67 


69 


80 


72 


78 






Daub— 15 


80 


71 


75 


77 


63 


68 


82 


66 


77 






Daub-20 


81 


66 


74 


79 


58 


68 


83 


59 


76 








Threshold: A = 20 — Good overal 


quality 






Daub— 2 


86 


88 


88 


85 


87 


86 


87 


88 


88 






Daub— 3 


86 


88 


88 


85 


87 


86 


87 


88 


88 






Daub— 4 


86 


88 


88 


86 


86 


86 


87 


88 


87 






Daub— 5 


86 


87 


88 


86 


86 


86 


87 


87 


88 






Daub— 10 


86 


85 


87 


86 


83 


86 


87 


81 


87 






Daub— 15 


88 


82 


86 


89 


79 


86 


89 


75 


87 






Daub— 20 


88 


78 


86 


89 


73 


86 


89 


69 


87 










Threshoh 


1: A = 45 


— Medium overa 


1 quality 






Daub— 2 


93 


95 


95 


94 


96 


95 


93 


95 


94 






Daub— 3 


93 


95 


95 


95 


96 


95 


94 


95 


95 






Daub— 4 


94 


95 


95 


95 


95 


95 


94 


94 


95 






Daub— 5 


94 


94 


95 


95 


95 


96 


94 


94 


95 






Daub— 10 


93 


93 


95 


95 


92 


96 


94 


89 


95 






Daub-15 


94 


91 


95 


95 


89 


96 


95 


84 


94 






Daub-20 


95 


88 


95 


96 


85 


96 


95 


78 


95 








Threshold: A = 6 


5 — Poo 


ir overall quality 










Daub— 2 


97 


98 


98 


97 


98 


98 


97 


98 


98 






Daub— 3 


97 


98 


98 


98 


98 


98 


97 


97 


97 






Daub— 4 


97 


98 


98 


98 


98 


98 


97 


97 


98 






Daub— 5 


97 


97 


98 


98 


98 


99 


97 


97 


98 






Daub— 10 


97 


96 


98 


98 


96 


99 


97 


93 


98 






Daub-15 


97 


95 


98 


98 


93 


99 


97 


89 


98 






Daub— 20 


98 


93 


98 


98 


90 


99 


98 


84 


99 





14 



Claudia Schremmer 



Table 3. Average quality of the six test images. Figure 2 gives a more ‘readable’ 
plot of these digits 



Average image quality PSNR [dB 






zero 


mirror 


circular 


zero 


mirror 


circular 


Wavelet 


padding 


padding 


convol. 


padding 


padding 


convol. 




Threshold A - 


= 10 


Threshold A - 


- 20 


Daub 2 


17.630 


17.602 


17.701 


15.242 


15.252 


15.246 


Daub 3 


17.745 


17.752 


17.768 


15.298 


15.330 


15.288 


Daub 4 


17.691 


17.711 


17.662 


15.244 


15.284 


15.213 


Daub 5 


17.719 


17.701 


17.680 


15.233 


15.270 


15.257 


Daub 10 


17.641 


17.615 


17.689 


15.253 


15.290 


15.306 


Daub 15 


17.695 


17.675 


17.686 


15.136 


15.185 


15.168 


Daub 20 


17.616 


17.654 


17.676 


15.135 


15.207 


15.114 




Threshold A = 


= 45 


Threshold A = 


= 85 


Daub 2 


13.057 


13.078 


12.942 


11.609 


11.736 


11.681 


Daub 3 


12.982 


13.144 


13.016 


11.659 


11.763 


11.643 


Daub 4 


12.932 


13.025 


13.002 


11.637 


11.740 


11.651 


Daub 5 


12.992 


13.110 


13.016 


11.610 


11.806 


11.626 


Daub 10 


12.823 


13.061 


12.935 


11.500 


11.713 


11.666 


Daub 15 


12.854 


12.985 


12.911 


11.422 


11.628 


11.538 


Daub 20 


12.673 


12.916 


12.788 


11.439 


11.624 


11.718 



' mi®™' paddinq A|S ‘ 

circula^convolution -2- 


15.4 

15.2 

I ^ 

14.8 

14.6 


miporDaddinq > 
circulafconvolution I - 






X 6 8 10 20 30 40 ' 

Quality -Threshold 45 


X 6 8 10 20 30 40 

Quality -Threshold 85 


mlr^or paddina AT 
circularAnvolution -A - 




' m^or-paddinq VA 
circula^cpnvplutipn -2- 


/ \ / ^ 



11.8 






11.6 

I 

11.2 




X 6 8 10 20 30 4 

Length of Wavelet Filter 


0 < 


X 6 8 10 20 30 40 

Length of Wavelet Filter 



Fig. 2. Visual quality of the test images at the quantization thresholds A = 
10, 20, 45 and 85. The values correspond to Table 3 



Empirical Evaluation of Boundary Policies 



15 



Table 4. Average bitrate heuristic of the six test images. Figure 3 gives a more 
‘readable’ plot of these digits 





Average discarded information 


Percentage [%] 




zero 


mirror 


circular 


zero 


mirror 


circular 


Wavelet 


padding 


padding 


convol. 


padding 


padding 


convol. 




Threshold A - 


= 10 


Threshold A - 


- 20 


Daub 2 


72.0 


72.3 


72.0 


83.2 


84.3 


84.0 


Daub 3 


71.8 


72.7 


72.5 


83.5 


84.3 


84.3 


Daub 4 


72.3 


72.5 


71.8 


83.8 


84.0 


84.0 


Daub 5 


72.7 


72.0 


72.0 


84.0 


83.8 


84.2 


Daub 10 


74.8 


67.8 


71.0 


85.0 


79.8 


83.7 


Daub 15 


78.0 


63.7 


69.8 


87.0 


76.2 


83.3 


Daub 20 


80.0 


58.8 


69.7 


88.0 


71.2 


83.5 




Threshold A = 


= 45 


Threshold A = 


= 85 


Daub 2 


92.7 


93.8 


93.7 


97.0 


97.7 


97.8 


Daub 3 


93.0 


93.8 


93.8 


97.2 


97.5 


97.7 


Daub 4 


93.3 


93.5 


94.0 


97.2 


97.3 


97.8 


Daub 5 


93.5 


93.0 


94.2 


97.3 


97.0 


98.0 


Daub 10 


93.5 


90.2 


94.2 


97.3 


94.8 


98.0 


Daub 15 


94.3 


87.0 


94.0 


97.5 


92.5 


98.0 


Daub 20 


95.2 


82.8 


94.2 


97.8 


89.0 


98.7 




Fig. 3. Average bitrate heuristic of the test images at the quantization thresholds 
A = 10, 20, 45 and 85. The values correspond to Table 4 



Image-Feature Based Second Generation 
Watermarking in Wavelet Domain 



Song Guoxiang and Wang Weiwei 

School of Science, Xidian University 
Xi’an, 710071, P.R.China 



Abstract. An image- feature based second generation watermarking 
scheme is proposed in this paper. A host image is firstly transformed 
into wavelet coefficients and features are extracted from the lowest ap- 
proximation. Then a watermark sequence is inserted in all high frequency 
coefficients corresponding to the extracted featured approximation coef- 
ficients. Original host image is not needed in watermarking detection, 
but the featured approximation coefficients position is necessary for ro- 
bust detection. The correlation between the embedded watermark and 
all high frequency coefficients of a possibly corrupted watermarked image 
corresponding to the approximate coefficients at the same position as the 
original featured approximation coefficients is calculated and compared 
to a predefined threshold to see if the watermark is present. Experimental 
results show the watermark is very robust to common image processing, 
lossy compression in particular. 

Keywords: image feature, digital watermarking, wavelet transform 



1 Introduction 

Lately, multimedia and computer networking have known rapid development and 
expansion. This created an increasing need for systems that protect the copyright 
ownership for digital images. Digital watermarking is the embedding of a mark 
into digital content that can later be, unambiguously, detected to allow assertions 
about the ownership or provenience of the data. This makes watermarking an 
emerging technique to prevent digital piracy. To be effective, a watermark must 
be imperceptible within its host, discrete to prevent unauthorized removal, easily 
extracted by the owner, and robust to incidental and intentional distortions. 

Most of the recent work in watermarking can be grouped into two categories: 
spatial domain methods and frequency domain methods. Kutter et al. [1] refered 
both the spatial-domain and the transform domain techniques as first genera- 
tion watermarking schemes and introduced the concept of second generation 
watermarking schemes which, unlike the first generation watermarking schemes, 
employ the notion of the data features. For images, features can be edges, cor- 
ners, textured areas or parts in the image with specific characteristics. Features 
suitable for watermarking should have three basic properties: First, invariance 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 16 21, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



Image-Feature Based Second Generation Watermarking in Wavelet Domain 



17 



to noise (lossy compression, additive, multiplicative noise, ect.) Second, covari- 
ance to geometrical transformations (rotation, translation, sub-sampling, change 
of aspect ratio, etc.) The last, localization (cropping the data should not alter 
remaining feature points). 

In this paper, we deal with the wavelet domain image watermarking method 
with the notion of second generation watermarking scheme. Previous wavelet 
domain watermarking schemes [2, 3, 4, 5, 6, 7, 8] added a watermark to a selected 
set of DWT coefficients in chosen subbands. The methods proposed in [2, 3, 6, 8] 
requires the original image for detection, while the methods in [4,5,7] does not. 
However, the method [4] needs the embedded position and the corresponding 
subband label as well as two threshold value. For the method [5], if the water- 
marked image is tampered, the number of the coefficients that are greater than 
the larger threshold may not be equal to the size of the embeded watermark, thus 
there existed a problem for detection in calculating the correlation between the 
embedded watermark and the coefficients of a possibly modified watermarked 
image, whose absolute magnitude is above the larger threshold. The method [7] 
embedded watermarks into all HL and LH coefficients at levels 2 to 4, resulted 
in poor quality. 

Based on the concept of second generation watermarking scheme, we pro- 
pose a wavelet domain watermarking method which embeds watermarks into all 
high frequency coefficients corresponding to the featured lowest approximation 
coefficients. First, the host image is transformed using DWT and features are 
extracted from the lowest approximation using the method in [9]. Then the wa- 
termark is embedded into all subband coefficients corresponding to the featured 
lowest approximate coefficients. Finally, the modified coefficients is inversely 
transformed to form the watermarked image. In the watermark detection, the 
original image is not needed, but for more robust detection, the featured lowest 
approximate coefficients position of the original image is required, which can be 
encrypted using private key encryption and stored in the image header. The cor- 
relation between the embedded watermark and all high frequency coefficients of a 
possibly corrupted watermarked image corresponding to the lowest approximate 
coefficients at the same position as the original featured approximation coeffi- 
cients is calculated and compared to a predefined threshold to see whether the 
watermark is present or not. Experimental results show that the watermark is 
very robust to common image processing, lossy compression in particular. Even 
when the watermarked image is compressed by JPEG with a quality factor of 
one percent, the watermark is still present. 

2 The Proposed Method 

The original image is firstly decomposed using DWT with 8 taps Daubechies 
orthogonal filter [10] until the scale N to obtain multiresolution LH n , HL n , HH n 
(n = 1,2, • • -,7V) and the lowest resolution approximation LLjy .There exists a 
tree structure between the coefficients [11] as shown in Fig. 1 (for N = 3). The 



18 



Song Guoxiang and Wang Weiwei 



tree relation can be defined as follows: 

tree{LLjsf{x 1 y)) = tree(HLN(x , y)) U tree(LHjsr(x , y)) U tree(HHN(x, y)) (1) 
tree(HL n (x, y)) = tree(HL n ®i(2x -1,2 y - 1)) U tree(HL n0 x (2x, 2 y - 1 )) 

U tree(HL n ®i(2x - 1, 2 y)) U tree(HL n ® i(2x, 2y)) (2) 

where n = N, N — 1, ■ ■ ■ ,2. For tree(LH n (x, y )), tree(HH n (x, y))(n = N,N — 
1 , • • • , 2 ), the definition is similar to ( 2 ). 

tree(HLi(x,y)) = HLi(x,y) 
tree(LH 1 (x,y)) = LH 1 (x,y) 
tree(HHi(x,y)) = HHi(x,y ) 

For the experiments reported in this paper, N is taken as N = 4. 

2.1 Feature Extraction 

We use the method in [9] to extract features of the image. The difference is 
that we extract features from the lowest approximation components LL n of the 
DWT of the image, rather than from the original image. Since the size of LLjy is 
1/(4^) times that of the original image, the time needed for extracting features 
is largely reduced. The feature extraction scheme is based on a decomposition of 
the image using Mexican- Hat wavelets. In two dimensions, the response of the 
Mexican-Hat mother avelet is defined as: 

i>{x, y) = (2- ( x 2 + y 2 )) e ®( x2+y2) / 2 (3) 

The isotropic nature of the Mexican-Hat filter is well suited for detecting point- 
features. Here we briefly describe the feature-detection procedure as follows: 
Firstly, define the feature-detection function, •) as: 

P ij (k,l) = \M i (k,l)- 1 M j (k,l)\ (4) 

where M{(k, /) and Mj(k, l) represent the responses of Mexican-Hat wavelets at 
the image location (fc, Z) for scales i and j respectively. For an image A, the 
wavelet response l) is given by: 

Mt(k, l) =< (2®V(2 l ))), A > (5) 

where < •, • > denotes the convolution of its operands. We only consider wavelets 
on a dyadic scale. Thus, the normalizing constant is given by 7 = The 

operator | • | returns the absolute value of its parameter. Here we take i = 2 and 
j = 4 as in [9]. Secondly, determine points of local maxima of P^(-,-). These 
maxima correspond to the set of potential feature-points. A circular neighbor- 
hood with a radius of 5 points is used to determine the local maxima. Finally, 
accept a point of local maxima of Pij (•, •) as a feature-point if the variance of the 
image-pixels in the neighborhood of the point is higher than a threshold. Here a 
7x7 neighborhood around the point is used for computing the local variance. A 
candidate point is accepted as a feature-point if the corresponding local variance 
is larger than a threshold, which we take as 20 . 



Image-Feature Based Second Generation Watermarking in Wavelet Domain 



19 



2.2 Watermark Inserting 

The original image I is firstly decomposed using DWT with 8 taps Daubechies 
orthogonal filter until the scale N = 4 to obtain multiresolution LH n , HL n , HH n 
(n = 1 , 2, • - • , 4) and the lowest resolution approximation LL 4 . Then feature- 
points are extracted from LL 4 using the method in 2.1. If LL^x, y ) is a feature- 
point, then some watermark bits x G X are added to all the children notes of 
tree(LL/±(x,y)). X stands for a set of watermark x and the elements xi of x 
are given by the random noise sequence whose probability law has a normal 
distribution of zero mean and unit variance. Since for every tree(LL4(x, ?/))), 
there are 255 children in all, except for the root, the size of the watermark cc, 
denoted by M, is given by M = 255 x the number of feature-points in LL 4 ). 
The specific embedding method is as follows: For every feature-point LL/±{x,y\ 
for every Wi G tree(LL/±{x,y)) and Wi 7^ LL^x^y) 

Wt <- Wi+alWtlx, ( 6 ) 

where wi and W x denotes respectively the DWT coefficient of the original and 
watermarked image , a is a modulating factor, here we take a = 0.2. Finally, 
inversely transform the modified multiresolution subbands to obtain the water- 
marked image / . 



2.3 Watermark Detection 

The original image is not required in the watermark detection, but for more 
robust detection, the feature- points position of the original image is indeed nec- 
essary. Firstly, A possibly corrupted watermarked image / is decomposed as / in 
2.2. Then for every feature-point LL^x^y), all coefficients Wi G tree(LL^{x,y)) 
and Wi 7^ LL/±{x,y) are taken out, where LL 4 and Wi respectively represents 
the lowest resolution approximation and high frequency coefficients of I . We cal- 
culate the correlation 2 between Wi and all candidates y G X of the embedded 
watermark x as: 

M 

z = l/Mj2Wiyi (7) 

1 = 1 

By comparing the correlation with a predefined threshold S z , which is given in 
[7] to determine whether a given watermark is present or not. In theory, the 
threshold S z is taken as 

M 

= < 8 > 

1=1 

In practice, the watermarked image would be attacked incidentally or intention- 
ally, so for robust detection, the threshold is taken as 

M 

S * = r ^E \Wi\,0<r<l 
1=1 



(9) 



20 



Song Guoxiang and Wang Weiwei 



3 Experimental Results 

In order to confirm that the proposed watermarking scheme is effective, we 
performed some numerical experiments with some gray-scale standard images. 
Here we describe experimental results for the standard image Henna” (512 x 
512 pixels, 8 bits/pixel) shown in Fig. 2 (a). Fig. 2(b) shows the watermarked 
image with parameters a = 0.2, TV = 4 and M = 4080. Next, we tested the 
robustness of the watermark against some common image processing operations 
on the watermarked image Fig. 2(b). Fig. 3 is the result of JPEG compression with 
quality factor of 1. The image after 11x11 mean filtering is shown in Fig. 4. The 
image after adding white Gaussian noise of power 40db is shown in Fig. 5. Fig. 6 
is the clipped image with only 25% center data left. Fig. 7 shows the result of 
rotation counter clockwise by 10 degrees. The response of the watermark detector 
and the corresponding threshold for the untampered and attacked watermarked 
image are given in Tab.l. The threshold is calculated using the equation (10), 
where r — 2/3 . As shown in Tab.l, though image degradation is very heavy, the 
watermark is still easily recovered and the detector response is also well above 
the threshold. Numerical experiments with the other standard images have also 
demonstrated similar results. 

4 Conclusions 

An image-feature based wavelet domain second generation watermarking scheme 
is proposed in this paper. Experiments show that the watermark is very robust 
to common image processing, lossy compression and smoothing in particular. 
Even for the JPEG compressed version of the watermarked image with quality 
factor of 1%, the feature-points remain salient. Furthermore, we will investigate 
watermarking method that resistant to geometric attacks. 



References 

1. M. Kutter, S. K. Bhattacharjee, and T. Ebrahimi, ” Towards second generation 
watermarking scheme,” Proc. IEEE ICIP’99, Vol. 1,1999 16 

2. D. Kundur and D. Hatzinakos, ” A robust digital image watermarking method using 
wavelet-based fusion,” Proc. IEEE ICIP’97, vol.l, 1997, pp.544-547 17 

3. X. G. Xia, C. G. Boncelet and G. R. Arce, ” A multiresolution watermark for digital 
images,” Proc. IEEE ICIP’97, Vol.l, 1997, pp.548-551 17 

4. H. Inoue, A. Miyazaki, A. Yamamoto, etal., ”A digital watermark bases on the 
wavelet transform and its robustness on image compression,” Proc. IEEE ICIP’98, 
Vol. 2, 1998, pp. 391-423 17 

5. R. Dugad, K. Ratakonda and N. Ahuja, ”A new wavelet-based scheme for water- 
marking image,” Proc. IEEE ICIP’98, vol. 2, 1998, pp. 419-423 17 

6. W. W. Zhu, Z. X. Xiong and Y. Q. Zhang, ” Multiresolution watermarking for 
images and video: a unified approach,” Proc. IEEE ICIP’98, vol.l, 1998, pp.465- 
468 17 



Image-Feature Based Second Generation Watermarking in Wavelet Domain 



21 



7. H. Inoue, A. Kiomiyazaki and T. Katsura, ”An image watermarking method based 
on the wavelet transform,” Proc. IEEE ICIP’99, vol.l, 1999, pp. 296-300 17, 19 

8. J. R. Kim and Y. S. Moon, ”A robust wavelet-based digital watermarking using 
Level- adaptive thresholding,” Proc. IEEE ICIP’99, vol.2, 1999, pp. 226-230 17 

9. S. K. Bhattacharjee and M. Kutter, ’’Compression tolerant image authentication”, 
Proc. IEEE ICIP’98, Vol.l, 1998 17, 18 

10. I. Daubechies, ’’Ten Lectures on Wavelets,” CBMS-NSF conference series in applied 
mathematics, SIAM Ed. 17 

11. J. M. Shapiro, ’’Embeded image coding using zerotrees of wavelet coefficients,” 
IEEE trans. On Signal Processing, Vol.41, No. 12, 1993, pp. 3445-3462 17 



A Study on Preconditioning Multiwavelet 
Systems for Image Compression 



Wonkoo Kim and Ching- Chung Li 

University of Pittsburgh, Dept, of Electrical Engineering 
Pittsburgh, PA 15261, USA 
wonkoo@home . com 
ccl@ee.pitt . edu 



Abstract. We present a study on applications of multiwavelet analysis 
to image compression, where filter coefficients form matrices. As a mul- 
tiwavelet filter bank has multiple channels of inputs, we investigate the 
data initialization problem by considering prefilters and postfilters that 
may give more efficient representations of the decomposed data. The in- 
terpolation postfilter and prefilter are formulated, which are capable to 
provide a better approximate image at each coarser resolution level. A 
design process is given to obtain both filters having compact supports, 
if exist. Image compression performances of some multiwavelet systems 
are studied in comparison to those of single wavelet systems. 



1 Nonorthogonal Multiwavelet Subspaces 

Let us define a multiresolution analysis of L 2 (M) generated by several scaling 
functions, with an increasing sequence of function subspaces {Vj}j z in L 2 (M): 

{0} C . . . C Vki C Uo c Ui c . . . c L 2 (M). (1) 

Subspaces Vj are generated by a set of scaling functions 0 1 , 0 2 , . . . , (j) r (namely, 
multiscaling functions) such that 

Vj := clos L 2 ^ < (j)™ k : 1 < m < r, k G Z >, Vj G Z, (2) 

i.e., Vj is the closure of the linear span of {<fi™ k }i m r ,k z hi L 2 (R), where 

<l% k (x) := 2i/ 2 <T& j x - k), Vr G R. (3) 

Then we have a sequence of multiresolution subspaces {Vj} generated by a set 
of multiscaling functions, where the resolution gets finer and finer as j increases. 

Let us define inter-spaces Wj C L 2 (M) such that Vj+ 1 := Vj + Wj, Vj G Z, 
where the plus sign with a dot (+) denotes a nonorthogonal direct sum. Wj 
is the complement to Vj in Vj+i, and thus Wj and Wi with j ^ l are disjoint 
but may not be orthogonal to each other. If Wj T Uj, Vj 7^ /, we call them 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 22 36, 2001. 
-e> Springer- Verlag Berlin Heidelberg 2001 



A Study on Preconditioning Multiwavelet Systems for Image Compression 



23 



semi- orthogonal wavelet spaces [1]. By the nature of construction, subspaces Wj 
can be generated by r base functions, '0 1 , ^ 2 , . . . , t/F that are multi wavelets. The 
subspace Wj is the closure of the linear span of {^ k }i ™ r ,k z' 

Wj := clos L 2 ( R ) < i)j™ k : 1 < m < r, k E Z >, V j E Z, (4) 

where 

:= 2 j/2 ip m (2 j x - k), M x e E. (5) 

We may express multiscaling functions and multiwavelets as vector functions: 







^ 1 (x)X 


(j>(x) := 


: , ip(x) := 






W(x)J 


\V{x)j 



Also, in vector form, let us define 

<p jk (x) := 2 j / 2 0(2- 7 x — k) and ifj k (x) := 2^ 2 ^(2 j x — k), VxgM. (7) 

Since the multiscaling functions (j ) 171 E Vo and the multiwavelets i/j 171 E Wo 
are all in Vi, and since Vi is generated by {<t>™ k {x) = 2 1 / 2 0 m (2x — fc)}i m r , /c z, 
there exist two £ 2 matrix sequences {H n } n % and {G n } n z such that we have a 
two- scale relation for the multiscaling function <j)(x)\ 

4>(x) = 2 H n <fi(2x — n), x E M, ( 8 ) 

n Z 

which is also called as a two-scale matrix refinement equation (MRE), and for 
multiwavelet ^(x): 



i/j(x) = 2 G n <fi(2x — n), x E M, 

n Z 



(9) 



where H n and G n are r x r square matrices. We are interested in finite sequences 
of H n and G n , namely, FIR (Finite Impulse Response) filter pairs. 

Using the fractal interpolation, Geronimo, Hardin, and Massopust success- 
fully constructed a very important multiwavelet system [2,3,4] which has two 
orthogonal multiscaling functions and two orthogonal multiwavelets. Their four 
matrix coefficients H n satisfy the MRE for a multiscaling function (f>(x): 





3 4^2 




01 




c 

G 




o 

o 


II 

!q 


10 10 
V2 3 

40 20 . 


, = 


10 

9\/2 1 
- 40 2 - 


II 


9\/2 3 

-40 20 - 


II 


-VI Q 
L 40 



and other four matrix coefficients G n generate a multiwavelet 'ip(x): 



Go 



V2 


3 


40 


20 


1 


3V2 


' 20 


20 



' 9\/2 

40 

_ 9 _ 

- 20 




9\/2 3 _ 

40 20 

__ 9 _ 3y/2 
20 20 



g 3 = 



-wO 

To 0. 



( 11 ) 



24 



Wonkoo Kim and Ching-Chung Li 




Fig. 1 . Geronimo-Hardin-Massopust orthogonal multiscaling functions and mul- 
ti wavelets 





Fig. 2. Cardinal 2-balanced orthogonal multiscaling functions and multiwavelets 



The GHM (Geronimo-Hardin-Massopust) orthogonal multiscaling functions 
are shown in Figure 1(a) and (b), and their corresponding orthogonal multi- 
wavelets are shown in (c) and (d). The GHM multiwavelet system has very 
remarkable properties: its scaling functions and wavelets are orthogonal, very 
shortly supported, symmetric or antisymmetric, and it has second order approx- 
imation so that locally constant and locally linear functions are in Vj. 

Another example of orthogonal multiwavelet is shown in Figure 2[5,6,7], 
where multiscaling functions are shown in figures (a) and (b), and multiwavelet 
functions are shown in figures (c) and (d), respectively. Two scaling functions in 
each cardinal balanced multiwavelet system are the same functions up to a half 
integer shift in time, and also the wavelets are the same up to a half integer shift 
in time. The approximation orders of the cardinal balanced orthogonal multi- 
wavelet systems are 2 for cardinal 2-balanced, 3 for cardinal 3-balanced, and 4 for 
cardinal 4-balanced systems. The cardinal 2-balanced orthogonal multiwavelet 
filters are given by 



H (z) 



b(z) 0.5 z~ r 
z~ 5 b(—l/z) 0.5z~ 2 \ ’ 



G(*) 



-b(z) 0.5 z~ r 

-z~ 5 b(-l/z) 0.5z~ 2 \ ’ 



( 12 ) 



where b(z) = 0.015625+0. 123015364784490z _1 +0.46875z _2 -0.121030729568979z _3 + 
0.015625z -4 — 0.001984635215512z -5 . For more details on cardinal balanced orthog- 
onal multiwavelets, refer to the paper written by I. Selesnick [6]. 

We should note that a scalar system with one scaling function cannot com- 
bine symmetry, orthogonality, and the second order approximation together. 
Furthermore, the solution of a scalar refinement equation with four coefficients 
is supported on the interval [0,3], while multiscaling functions with four matrix 
coefficients can be supported on a shorter interval. 



A Study on Preconditioning Multiwavelet Systems for Image Compression 



25 



Since all elements of both 4>(2x) and 4>(2x — 1) are in V\ and V\ = Vo + Wo, 
there exist two £ 2 matrix sequences {H n } n z and {G n } n z such that 



(p(2x -k) = Y \Hk®2n<K x -n) + Gl® 2n ip(x - n) , Vke Z, 



n Z 



(13) 



which is called the decomposition relation of <f> and 'if. 1 

We have two pairs of sequences ({iL n }, {G n }) and ({JT n }, {G n }), which are 
unique due to the direct sum relationship V\ = Vb+Wo- A carefully chosen pair of 
sequences ({id n }, {G n }) can generate multiscaling functions and multiwavelets 
and thus multiwavelet subspaces; hence, they can completely characterize a mul- 
tiwavelet analysis. 



2 Multiwavelet Decomposition and Reconstruction 



From the formulas (8), (9), and (13), the following signal decomposition and 
reconstruction algorithms can be derived. Let Vj e Vj and Wj £ Wj so that 

Vj(x) := Y c i,k ■ -k) = Y c J,k - fc ); ( 14 ) 

k Z k Z 

Wj(x) := Y d i’ k •V’( 2J a:- k) = Y^k^i Vx-k), (15) 

k Z k Z 



where • denotes a dot product between two vectors and - T denotes the transpose 
operator. The scale factor 2 J / 2 is not explicitly shown here for simplicity but 
incorporated into the sequences Cj and djk . By the relation Vj = Vj® \ -j- Wj® i, 

Vj (x) := Vj ® i (x) + (x) ( 16 ) 

= 51 c ^ k ' - k) + Y d m ,fc • VK 2 * 91 ® - fc), V j £ Z. 

fc Z /c Z 

Thus we have the following recursive decomposition (analysis) formulas : 



Cj01 ; /c — ^ ^ H n ®2 k Cj,n — 
n 

dj01 5 /c = ^ ^ G n ®2k Cj,n 



^ ^ H®n Cj,2k<g>ri') 
n 


V j e Z; 


(17) 


^ ^ Gr<S>n Cj,2k<g>m 


V j £ Z. 


(18) 



n 



n 



An original data sequence Co (={co,fc}fc) is decomposed into c\ and d\ data 
sequences, and the sequence c\ is further decomposed into C 2 and d^ sequences, 
etc.. Keeping this process recursively, the original sequence Co is decomposed 

into di, d 2 , d 3 , Note that this process continuously reduces the data size 

by half for each decomposed sequence but it conserves the total data size. 

1 We here intentionally transposed the matrices of H and G and reversed indexing 
instead of 2 n — k, for some convenience in representing formulas of dual relationship. 



26 



Wonkoo Kim and Ching-Chung Li 




(b) Multiwavelet filterbanks by reverse indexing 



Fig. 3. The multiwavelet transform filter banks. Filters are r x r matrices and 
data paths are r lines, where r = 2 in our examples. The multiwavelet systems 
(a) and (b) are equivalent, except that filter indices are all reversed between the 
two systems 



Let D k, K > 1, be the subsampling (downsampling) operator defined by 

(Dx®)[n] := x[ Kn\, (19) 



where IT is a subsampling rate and x is a sequence of vector- valued samples. 
The decomposition formulas can be rewritten in the Z-transform domain as 

Cj ® 1 ( z ) = -D 2 H®(z)cj(z), (20) 

d m (z) = D 2 G®(z) Cj (z), (21) 

where the superscript ® denotes reverse indexing, i.e., H® := Hi T . 

From the two-scale relations (8), (9) and from (14), (15), we have the follow- 
ing recursive reconstruction (synthesis) formula : 



c j,k — 2 (^/c®2 n C j<S)l,n + ^fc<g)2n ^J01,n) • 

n 

Let Ujf, K > 1, be the upsampling operator defined by 

•= I if f is an inte S er ’ 

' ' (0, otherwise, 



( 22 ) 



(23) 



where K is an upsampling rate and x is a sequence of vector- valued samples. 
Then the reconstruction formula can be rewritten in the Z-transform domain as 



Cj(z) = 2 [H T (z)lJ 2 c m (z) + G T (z)U 2 d j ® 1 (z )] (24) 

The decomposition and reconstruction systems implemented by multiwavelet 
filterbanks are shown in Figure 3, where the system (a) is the exact implementa- 
tion of our equations derived. If we take reverse indexing for all filters, we have 
the system (b), and the multiwavelet decomposition formulas become 

Cj-1 (z) = D 2 H (z)Cj(z), 
dj®i(z) = D 2 G (z)cj(z), 



(25) 

(26) 



A Study on Preconditioning Multiwavelet Systems for Image Compression 



27 



and the reconstruction formula becomes 

Cj(z) = 2 [H.*(z)U 2 c j - 1 (z) + G*(z)U 2 d j - 1 (z )] . (27) 

Note that the input data Cj is a sequence of vector-valued data, every data 
path has r lines, and filters are r x r matrices. We restrict r = 2 in this study. 
Constructing a vector- valued sequence Cj from a signal or an image is nontrivial. 
As an 1-D input signal is vectorized, the direction of filter indexing will affect 
the reconstructed signal in an undesirable way, if the vectorization scheme does 
not match with filter indexing. This effect does not happen in a scalar wavelet 
system, whose filters are not matrices. As we do not take reverse indexing for 
data sequences, we will take the system (a) of Figure 3 in our implementation. 
A prefilter for the chosen input scheme will be designed later in Section 5. 



3 Biorthogonality and Perfect Reconstruction Condition 



From the two-scale dilation equations (8), (9), and the decomposition relation 
(13), we have the following biorthogonality conditions: 



H( 2 )H t (z) + U(-z)U 1 (-z)=I r ; 


(28) 


H(z)G t (z) + H(-2)G T (-z) = 0 r ; 


(29) 


G(z)H t ( 2 ) + G(— z)H T (— 2 :) = 0 r ; 


(30) 


G(z)&(z) + G(-z)&(-z) = I r , 


(31) 



which completely characterize the biorthogonality between the analysis filter 
pair (H, G) and the synthesis filter pair (H, G). (Namely, H1G and H1G.) 
Let H m (z) denote the modulation matrix 2 of (H, G) as defined by 



H m (z) 



HW H(-4 

.G (z) G (~z) 



(32) 



and H m ( 2 :) denote the modulation matrix of (H, G) similarly defined, then the 
above biorthogonality condition becomes 



H m (*)Hm(*) 



H(*) H (-4 
G(z) G(-z). 



' H*(2) G*(z)' 




I r 0 


_H*(— 2 ) G*(-2)_ 




0 Ir 



(33) 



From the decomposition and reconstruction formulas (20), (21) and (24), we 
have the following perfect reconstruction (PR) condition: 



Ul(z)U m (z) = Cl 2r , 



(34) 



where c is a non-zero constant (a scale change in the reconstructed signal is 
allowed) . 



2 The modulation matrix is also called as the AC (alias component) matrix[8]. 



28 



Wonkoo Kim and Ching-Chung Li 



Theorem 1. For two matrix filter pairs (H, G) and (H,G) ; the modulation 
matrices H m (z) and H m (z) are defined by 



H m (z) 



'H (z) H(-*)' 
G(z) G(-z)J ’ 



HmW 



H(*) U(-z)' 
_G(z) G(-z)_ 



(35) 



Then 

H m (z)HL(z) = HL(z)H m (z) = cl 2r , (36) 

where c is a nonzero constant, is the necessary and sufficient condition for the 
two matrix filter pairs (H, G) and (H, G) to be biorthogonal and to ensure the 
perfect reconstruction. If these filter pairs generate multiscaling functions and 
multiwavelets, then they are biorthogonal. 

For orthogonal filter pairs, we have H = H and G = G, and then 

H m (z)HT, (z) = (z) H m (z) = cl 2r . (37) 



Hence, H m (z) is paraunitary (lossless), i.e., unitary for all 2 on the unit circle. 



4 Construction of Biorthogonal Multiwavelets 



Plonka and Strela constructed biorthogonal Hermite cubic (piecewise cubic poly- 
nomial) multiscaling functions and multiwavelets using the cofactor method 
[9,10]. The coefficient matrix 



H(z) 



16 



4(1 + z~ 1 ) 2 -2(l-^- 1 )(l+^- 1 )' 

3(1 - z- 1 )^ 1 + z- 1 ) -1 + 4Z- 1 - z~ 2 



(38) 



generates Hermite cubic multiscaling functions, where det H (z) = (l+z (g)1 ) 4 /128. 
A possible choice of H for dual functions is 



H (z) 



_ 1 _ 

32 



'z-8 + 18Z- 1 - 8~ 2 + z“ 3 -3z + 12 - 12z~ 2 + 3z“ 3 

2z - 8 + 8z~ 2 - 2z~ 3 -4z + 8 + 24 z -1 + 8z 2 - 4z~ 3 



(39) 



By the biorthogonality conditions, we have 



G(z) 



[ -4(1 - z- 1 ) 2 6(1 - z~ l )( 1 + z- 1 ) 

16 _-(l - z _1 )(l + z~ x ) 1 + 4z~ x + z~ 2 



(40) 



and by cofactor method, 

'1 + 8z _1 + 18z~ 2 + 8z“ 3 + z“ 4 -1 - 4z ~ 1 + 4z~ 3 + z“ 4 

6 + 24z -1 — 24z~ 3 — 6z~ 4 -4 - 8z _1 + 24z~ 2 - 8z~ 3 - 4z~ 4 ' 

(41) 

The Hermite cubic multiscaling functions and multiwavelets generated by H and 
G are shown in Figure 4 (a)-(d). Their corresponding biorthogonal multiscaling 
functions and multiwavelets are shown in Figure 4 (e)-(h). 



G «=32 



A Study on Preconditioning Multiwavelet Systems for Image Compression 29 




Fig. 4. Hermite cubics and their dual multiwavelets 



5 Preconditioning Multiwavelet Systems 

In this section we consider multiwavelet systems that analyze discrete data, and 
investigate how to precondition a multiwavelet system by prefiltering input data, 
which is not necessary for the case of single (or scalar) wavelet systems. 



5.1 Prefilters and Postfilters 

Consider the multiwavelet series expansion: 

fj(t ) : = _ k ) ( 42 ) 



k 

From a given 1-D signal x[n\, construct a vector- valued sequence x[n\ by 

x[nr] 



x[n\ := 



r > 1 



(43) 



[ x[nr + r — 1] J 

Let us define a prefilter Q(z), which maps a vector- valued sequence space onto 
itself, such that the coefficient vector sequence co,fc is obtained by filtering x[n\: 



Go(z ) = Q (z)x(z) (44) 

For any j < 0, Cj ^ is decomposed to {cj® i^, dj ® by a layer of multiwavelet 
decomposition. Recursive multiwavelet decompositions down to a resolution level 
J < 0 give us a set of decomposed data sequences cj ^ and {dj^}j j< o- Recur- 
sive multiwavelet reconstruction from the decomposed data set gives the original 
coefficient vector co,fc. Then x(z) is reconstructed by applying a postfilter P (z): 



x(z) = P (z)cq(z) 



(45) 








30 



Wonkoo Kim and Ching-Chung Li 



z- 1 . 


— *©— * 


Q(z) 


Xj [n] — 1 


^*o~~ * 





c][n 



c 2 M 



c) [n] 



c 2 jH 



P(z) 



— *- ■ 

..z " 1 

-o-i 



(a) Prefilter (b) Postfilter 

Fig. 5. Prefilter and postfilter blocks. A unit delay and downsampling in a pre- 
filter block (a) vectorize the 1-D input data sequence Xj [n] to a vector- valued 
sequence, where the prefilter output [cj [n] c?[n]\ is the input to multiwavelet de- 
composition filter banks. A unit delay and upsampling in a postfilter block (b) 
serialize the two-channel postfilter output vector sequence to the 1-D output sig- 
nal Xj [n ] , where [c][n] c 2 [n]\ are from the outputs of multiwavelet reconstruction 
filter banks 



The postfilter P must be an inverse of the prefilter Q up to some unit delays for 
the perfect reconstruction: 

P(z)Q(z) = z® 1 1, for some integer l. (46) 

We may assume l = 0 (no delay) for convenience. 

Define 

Xq(z) := x(z) and Xj(z) := P (z)cj(z). (47) 

Then {xj}j < o are the projections of x into (discrete-time) multiscaling spaces at 
lower resolutions. This implies that a postfilter should be applied to a coefficient 
vector Cj if we want to see a decomposed signal at the resolution level j < 0. 

For an r-channel multiwavelet system, the construction of a vector-valued 
input sequence from an 1-D signal can be implemented in a prefilter block by 
serial-to-parallel conversion (vectorization) by using r — 1 unit delays and then 
downsampling each channel at the rate r. The block diagrams of a prefilter and 
a postfilter blocks for a 2-channel multiwavelet system are shown in Figure 5. 

5.2 Interpolation Prefilter and Postfilter 

In the multiwavelet case, in order to avoid the undesirable visual effect, we need 
a prefilter that computes multiscaling coefficient sequence co,fc from a discrete- 
time input signal before starting the multiwavelet decomposition [11, 12, 13]. In 
this section, we develop a process of finding a pair of prefilter and postfilter such 
that 

fo(t) ■■= X] c^ k cj){t - k) (48) 

k 

interpolates an original signal xo[n\. Since we have r scaling functions, a contin- 
uous-time signal /o(£) is sampled at the interval of 1/r at the 0-th resolution 
level: 

/o(~) = c o,k - k ) = - fc ) T c °> fe ’ 

k Z k Z 



( 49 ) 



A Study on Preconditioning Multiwavelet Systems for Image Compression 



31 



and we impose an interpolation property by fo(j) = xo[n\. We construct vector- 
valued sequences f 0 [n] and Xo[n\ from the sampled sequence fo(n/r) and the 
1-D signal xq [n], respectively: 





fo(n) 




x 0 [nr] 




fo(n+l) 




x 0 [nr + 1] 


/oN : = 


Jo(n+^)_ 


, x 0 [n] := 


xq [nr + r — 1] 



(50) 



then the interpolation condition fo(n/r) = xq [n] gives the following relation: 

fo N = *o [n] = ^ p n®k Co [k] = ^2 P k c 0 [n - k], (51) 

k Z k Z 

where P n is an r x r matrix sequence and defined by 



Pn := 



' 0(n) T 



(52) 



This is an interpolation postfilter that maps the space of scaling coefficients Cj [k] 
to the space of sampled signals fj [n ] . At any resolution level j, a decomposed 
signal can be obtained by filtering scaling coefficients Cj[k\ by the postfilter P n : 



Xj [n] = ^2 p n®k Cj [k\ = '22p k Cj [n - k] . 



(53) 



k Z k Z 

This relation is expressed in the Z-transform domain as 



Xj(z) =P(z)cj(z), (54) 

where P (z) := J2 n PnZ® n - By (52), P n is a finite sequence (FIR filter) if the 
scaling vector function </> is compactly supported. 

We define a prefilter Q(z) such that 

Q(*)P(*) = P(z)Q(z) = I r . (55) 

Then the scaling coefficient Cj(z ) is obtained by filtering the signal Xj(z ): 

Cj{z) = Q{z)xj(z). (56) 

To have an FIR solution to the above condition (55), det(P(z)) must have the 
form of det(P(z)) = az ® 1 , where a is a constant and l is an integer. 

For the GHM orthogonal multiwavelet system, an interpolation postfilter P 
is obtained from the GHM scaling functions (Figure 1(a) & (b)): 

M = [ 0 1.73210618015Z” 1 1 , . 

~ 1.95965444133 -0.519631854046 -0.519631854046.z- 1 ( - b '' 



32 



Wonkoo Kim and Ching-Chung Li 



The corresponding prefilter Q is computed from the condition P(z)Q(z) = I, 



Q(z) = P-\z) = 



0.1530923245^ + 0.1530923245 0.5103077369' 
0.5773517497^ 0. 



(58) 



For the cardinal 2-balanced (also 3-balanced or 4-balanced) orthogonal mul- 
tiwavelet system, we obtain the postfilter and prefilter as 



P (z) 



0 V2z~ 2 ' 

y/2 z- 1 0 



Q(*) 



0 z/y/2 

z 2 /V2 0 



(59) 



The biorthogonal Hermite cubic multiwavelet system does not give a stable 
prefilter for an interpolation post filter. In this case, we need to design a different 
pair of prefilter and postfilter for those systems. One possible solution is to design 
an orthogonal prefilter. 



5.3 Orthogonal Prefilter 

A prefilter Q(z) := QnZ® n is said to be orthogonal if 

\\Q * c ll = || c|| (60) 

for all c G <£ 2 (Z) r , where Q is an impulse response (a sequence of r x r matrices) 
of Q(z) and * denotes a discrete (matrix) convolution operator. The above con- 
dition ||Q(z)c(z)|| = ||c(z)|| is equivalent to the paraunitary condition of Q(z): 



Q(^)Q+ 1 ) T = I. (61) 

An FIR filter Q(z) is paraunitary if and only if it is of the form 

N 

Q(*0 = Q(l) H(I - Pi + PiZ ei ), (62) 

i= 1 

where Q(l) is an orthogonal (unitary) matrix, e* = ±1, and Pi for i = 1, ..., TV are 
orthogonal (unitary) matrices [8]. Higher approximation orders will give quite 
complex relations, so here we consider a prefilter only up to the approximation 
order 2. Then, for a minimal filter length ( N = 2), we need to find P\ and P 2 such 
that Q (z) = Q(1)(I — P\ + Piz)(I — P 2 + P 2 Z) satisfies the above orthogonality 
condition. A delay factor z® 2 may be introduced to make Q(z) causal. An ex- 
ample of orthogonal prefilter of approximation order 2 for the GHM orthogonal 
multiwavelet system is given by Q(z) := Qo + Qiz® 1 , where 



Q 0 
Qi 



0.11942337067748 0.99158171438258 
0.04967860804828 -0.00598315472909J 5 

-0.00598315472909 -0.04967860804828 
0.9915817143825 -0.11942337067748 



(63) 



A Study on Preconditioning Multiwavelet Systems for Image Compression 



33 



Table 1 . Compression performances of wavelet systems 





PSNR [dB] 




Multiwavelets 


Single Wavelets 




Orthogonal 


Biorth. 


Orthogonal 


Biorth. 


CR 


GHM (i) 


GHM (o) 


CardBal2 


H- Cubics 


D4 D6 


Bin9-7 


2 


48.929 


47.933 


48.317 


44.262 


47.410 48.232 


49.162 


4 


41.012 


40.500 


41.327 


36.964 


39.483 41.233 


42.388 


8 


36.126 


35.717 


37.041 


32.212 


34.762 36.874 


38.626 


16 


32.259 


31.887 


32.922 


29.004 


30.956 32.568 


35.481 


32 


28.786 


28.348 


29.296 


26.730 


27.617 28.810 


31.799 


64 


26.031 


25.590 


26.070 


23.847 


24.991 25.532 


27.535 


128 


23.379 


23.036 


23.381 


21.414 


22.688 23.106 


24.121 


256 


20.566 


20.572 


20.785 


20.453 


20.559 20.672 


20.712 


Prefilter 


Inter. 


Orth. 


Inter. 


Orth. 


N/A 



6 Compression Performances 



Multiwavelet systems have been explored for applications to data compression 
and image processing [5,13,14,15,16]. With the prefilers and postfilters that we 
have designed for multiwavelet systems, we have investigated the applications 
of these systems to image compression and examined their compression perfor- 
mances. Experimental studies are performed on the level of compression per- 
formances of three multiwavelet systems (two orthogonal multi wavelets, GHM 
and cardinal balanced, and one biorthogonal multiwavelet, Hermite cubics) in 
comparison to some single wavelet systems (Daubechies’ D4 and D6 orthogonal 
wavelets and binary 9-7 biorthogonal wavelet). We consider a simple compres- 
sion scheme with a uniform quantizer, which removes a certain number of small 
values from highpassed subimages but keeps the larger values to achieve a spec- 
ified compression ratio (CR). We used six test images (5125128-bit) of Lena, 
Airplane, Baboon, Peppers, Sailboat, and Wavy in our experiments. Our exper- 
iments suggested that wavelet decomposition up to the 3rd or 4th level would 
give a reasonably high compression ratio and a good reconstruction. 

To describe the image fidelity, PSNR (peak signal-to- noise ratio) is defined 



by 



PSNR [dB] := 20 log 




1 

~MN 



M M 

EB/m 

i= 1 j= 1 




(64) 



where / is a M x N noisy or distorted image (decompressed or reconstructed 
image) and s is the M x N original image. The PSNR values shown in Table 1 
are the average values taken from the experimental results for the six test images 
at each given compression ratio. The image compression performances of orthog- 
onal wavelet systems are shown in Figure 6(b) and some biorthogonal wavelet 
systems in Figure 6(a). In orthoronal wavelet systems, multiwavelet systems 
perform better than single wavelet systems with comparable support lengths. 



34 



Wonkoo Kim and Ching-Chung Li 



Compression Performance 




(a) Biorthogonal systems 



Compression Performance 




Compression Ratio 

(b) Orthogonal systems 



Fig. 6. Compression performances of wavelet systems 



However, the binary 9-7 biorthogonal single wavelet system significantly outper- 
forms other wavelet systems, because it has a higher order of approximation and 
symmetric functions. The biorthogonal Hermite cubic multiwavelet system with 
an orthogonal prefilter of approximation order of 2 did not give a desirable com- 
pression performance. The reason is that this orthogonal prefilter is not a good 
approximation to an interpolation prefilter because of its lower approximation 
order (only 2) while the Hermite cubics have the 4th order approximation. We 
have yet to find a good biorthogonal multiwavelet filters and prefilters. 






A Study on Preconditioning Multiwavelet Systems for Image Compression 



35 



7 Conclusion 

In this paper, multiwavelet systems are applied to image compression. Each line 
of image data is vectorized for r channel inputs of a multiwavelet system. A 
general method of prefiltering the inputs has been formulated to provide data to 
the multiwavelet filter bank, which should enable the reconstruction of the origi- 
nal data after postfiltering. A design process for interpolation prefilter-postfilter, 
if exist, has been developed, which will provide a better approximation image 
at each coarser resolution level. These filters must be of the finite impulse re- 
sponse type, or else, an orthogonal prefilter of some approximation order can be 
designed. The prefilters and postfilters have been designed for 3 multiwavelet sys- 
tems (GHM, cardinal balanced, and Hermite cubics). Using these filters, image 
compression performances of orthogonal multiwavelet systems have been shown 
to be better than those of the scalar orthogonal wavelet systems. 



References 

1. Chui, C. K.: An Introduction to Wavelets. Volume 1 of Wavelet Analysis and Its 
Applications. Academic Press (1992) 23 

2. Geronimo, J. S., Hardin, D. P., Massopust, P. R.: Fractal functions and wavelet 
expansions based on several scaling functions. Journal of Approximation Theory 
78 (1994) 373-401 23 

3. Donovan, G. C., Geronimo, J., Hardin, D. P.: Intertwining multiresolution analyses 
and the construction of piecewise polynomial wavelets. SIAM Journal of Mathe- 
matical Analysis 27 (1996) 1791-1815 23 

4. Donovan, G., Geronimo, J. S., Hardin, D. P., Massopust, P. R.: Construction of 
orthogonal wavelets using fractal interpolation functions. SIAM Journal of Math- 
ematical Analysis 27 (1996) 1158-1192 23 

5. Strela, V., Walden, A. T.: Orthogonal and biorthogonal multiwavelets for signal 
denoising and image compression. SPIE Proc. 3391 AeroSense 98, Orlando, Florida, 
April 1998 (1998) 24, 33 

6. Selesnick, I. W.: Interpolating multiwavelet bases and the sampling theorem. IEEE 
Trans, on Signal Processing 47 (1999) 1615-1621 24 

7. Chui, C. K., Lian, J.: A study on orthonormal multi- wavelets. J. Appl. Numer. 
Math. 20 (1996) 273-298 24 

8. Vaidyanathan, P. P.: Multirate Systems and Filter Banks. Prentice-Hall, New 
Jersey (1993) 27, 32 

9. Strela, V.: Multiwavelets: Theory and Applications. PhD thesis, Massachusetts 
Institute of Technology, Cambridge, Mass. (1996) 28 

10. Plonka, G., Strela, V.: Construction of multiscaling functions with approximation 
and symmetry. SIAM Journal of Mathematical Analysis 29 (1998) 481-510 28 

11. Xia, X. G., Geronimo, J. S., Hardin, D. P., Suter, B. W.: Design of prefilters for 
discrete multiwavelet transforms. IEEE Trans, on Signal Processing 44 (1996) 
25-35 30 

12. Xia, X. G.: A new prefilter design for discrete multiwavelet transforms. IEEE 
Trans, on Signal Processing 46 (1998) 1558-1570 30 

13. Miller, J. T., Li, C. C.: Adaptive multiwavelet initialization. IEEE Trans, on Signal 
Processing 46 (1998) 3282-3291 30, 33 



36 



Wonkoo Kim and Ching-Chung Li 



14. Xia, T., Jiang, Q.: Optimal multifilter banks: design, related symmetric extension 
transform and application to image compression. IEEE Trans, on Signal Processing 
47 (1999) 1878-1889 33 

15. Jiang, Q.: On the design of multifilter banks and orthogonal multiwavelet bases. 
IEEE Trans, on Signal Processing 46 (1998) 3292-3303 33 

16. Strela, V., Heller, P., Strang, G., Topiwala, P., Heil, C.: The application of mul- 
tiwavelet filter banks to image processing. IEEE Trans, on Image Processing 8 
(1999) 548-563 33 



Reduction of Blocking Artifacts in Both Spatial Domain and 
Transformed Domain 



Wing-kuen Ling and P. K. S. Tam 

Department of Electronic and Information Engineering 
The Hong Kong Polytechnic University 
Hung Horn, Kowloon, Hong Kong 
Hong Kong Special Administrative Region, China 
Tel: (852) 2766-6238, Fax: (852) 2362-8439 
Email: bingo@encserver.eie.polyu.edu.hk 

Abstract. In this paper, we propose a bi-domain technique to reduce the blocking artifacts commonly 
incurred in image processing. Some pixels are sampled in the shifted image block and some high frequency 
components of the corresponding transformed block are discarded. By solving for the remaining unknown 
pixel values and the transformed coefficients, a less blocky image is obtained. Simulation results using the 
Discrete Cosine Transform and the Slant Transform show that the proposed algorithm gives a better 
quantitative result and image quality than that of the existing methods. 



1 Introduction 

Many images are very large in size, and so it typically requires an extensive computation to process a whole 
image. Hence, dividing an image into a number of small blocks with size 8x8 for processing is very common in 
practice, such as that employed in the JPEG, MPEG-1/2, H. 261/263 standards. However, the block-based coded 
images suffer from a kind of distortion, called blocking artifacts, especially when the compression ratio is high. 
There are boundaries among the blocks and these boundaries are very disturbing. 

Several techniques have been developed to reduce the blocking artifacts: The theory of projection onto 
convex set (POCS) has been proposed [1], but it requires a large number of iterations for convergence. Methods 
using interleaved image blocks before the encoding were also suggested [2], but they are not in conformity with the 
coding standards. Lowpass filtering over the block-based coded images were also proposed [3], but it may cause 
serious distortions when the image contains high frequency components. Some adaptive filtering approaches have 
been proposed [4], but the cost is too high. 

In this paper, we propose a bi-domain de-blocking technique, which samples the shifted image block at 
certain fixed locations and discards some high frequency components of the corresponding transformed block. By 
solving for the remaining unknown pixel values and the transformed coefficients, a less blocky image is obtained. 
This idea has been carried out for the Discrete Cosine Transform (DCT) and the Slant Transform. It is found that 
the proposed bi-domain de-blocking technique reduces blocking artifacts effectively. 

Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 37-43, 2001. 

© Springer- Verlag Berlin Heidelberg 2001 




38 Wing-kuen Ling and Peter Kwong-shun Tam 

2 Vector Representation of an Image Transform 



In the application of transform techniques to image processing, a linear separable orthonormal block 
transform with block size 8x8 can be expressed as Y=F X F T , where X is the 8x8 image block, Y is the 8x8 
transformed block and F is the 8x8 transformed matrix. Every element in the matrix Y can be expressed as: 



where x km is at the k! h row and m th column of matrix X,f y is at the i th row and j ,h column of matrix F and y pq is at the 
p th row and q 1h column of matrix Y. 

Since y pq is a linear combination of x km , we can express it as follows: 



8ty n # 
3 • ! 




1 


3 • ! 


3 • 






/ 




3 P . q \ 
3 : ! 


3 P . 
3 : 




^88 V 


^81 A 


;l 



/ 18 / 18 # &c n # 

: ! 3 : ! 

4 * 4 »[ 

= ! 3 : i 

/g8 88 V 



( 2 ) 



The above equation is a vector representation of an image transform from an image vector x to a transformed 
vector y with y=C x, where C is a 64x64 matrix and x is a 64x1 column vector. Similarly, the inverse transform can 
be represented as x=T y with T=C'‘, where T is a 64x64 matrix and y is a 64x1 column vector. 



3 Blocking Effect Model 



By shifting the block-based coded image four pixels both horizontally and vertically, the visible edge is at 
the middle of the shifted image block. This shifted image block can be modeled using four 4x4 matrices [5] as 
follows: 

y *4,4 ^4,4#. (3) 

%4*4 CV 

If the compression ratio of the block-based coder is too high that all the AC coefficients of the block-based 
coded image are quantized to zero, then the four matrices of the shifted image block become four constant matrices. 
Consequently, we only have four different pixel values (a, b, c and d) in the shifted coded image block. 



4 Bi-domain De-blocking Algorithm 

Let Xi° ld be the i th row and j th column of the unprocessed shifted image block and y iJ old be the i th row and j th 
column of the corresponding transformed block. Similarly, let x," ew be the i th row and j th column of the processed 
shifted image block and y™ w be the i th row and j th column of the corresponding transformed block. 

For the ideal terrace image block, that is, there are only four different pixel values in the shifted image block, 
the minimum number of the sampling points in the pixel domain is four. In order to reduce the blocking artifacts, 
those sampling points should be sampled as far to the block edge as possible. Hence, we sample at the corners of 
the shifted image block and we have x 11 old =x II new , x 18 old =x 18 new , x 81 old =x 81 new , x 88 old =x 88 new . For the other pixel values, 




Reduction of Blocking Artifacts in Both Spatial Domain and Transformed Domain 



39 



we let them to be unknown at this stage and to be determined in the next stage. 

In the transformed domain, some coefficients, especially the high frequency components, suffer from the 
blocking artifacts because the block edge always contains high frequency components. Hence, in order to reduce 
the blocking artifacts, we should discard the high frequency components, that is, setting y/ ew =^ for some i,j. For 
the remaining transformed coefficients, we let them to be unknown at this stage. 

According to the sampling scheme mentioned above, we can pick up the corresponding rows in equation (2) 
and break down the matrix multiplication into a sum of two matrix multiplications, as follows: 



& n # 


&• 


*l,8 3/>*l0? 


...# _ 


WiO- 


...# 


3 ! 


3 


! &: # 3 


i 


3*18 ! 


3" 


A8 3p*l(H 


! 3 K 3" 
• ••! ^"0 3- 

j & = V 3 


*8,8 3hi*l0n 




3x xl ! 
3 ! 


3-. 

3 




*57,8 3«*l0 n 


i 

-v 


°A88V 


If 


*64,8 3/7*10 ? 




*64,8 J«*l0« 



&: # 
3 ! ’ 

#■»! 



( 4 ) 



where t fJ is at the i th row and j th column of the matrix T. The above equation can be expressed in the form of 
x i = Si yi+S 2 y 2 , where is a vector containing the low frequency components, y 2 is a vector containing the high 
frequency components, and x l is the vector containing the four corners of the shifted image block. Si and S 2 are the 
corresponding matrices. 

Since we keep four pixel values unchanged after processing, we can set up four linear independent equations 
in the spatial domain. In order to find out the unknown pixel values, we need to set sixty transformed coefficients 
to zero (y 2 new =0) and find out the remaining four transformed coefficients (y l nc ' v ). The detail procedure is as 
follows: 

Since x, remains unchanged after processing, we have x 1 new =x 1 0ld . As we discard the high frequency 
components of the processed shifted image block, so we have y 2 new =0. This implies that 
x 1 new =x 1 0ld =S 1 y! new =S 2 y! 0ld +S 2 y 2 0ld + y 1 new =S 1 ' 1 x 1 0ld . Figure 1 shows the block diagram of the proposed 
algorithm. 



Original 

image 




Fig. 1. Block diagram of the proposed de-blocking technique 
The idea is applied to the Discrete Cosine Transform (DCT) and the Slant Transform (ST) as follows: 

4.1 Discrete Cosine Transform (DCT) 

It can be shown that the only non-zero DCT coefficients for the ideal terrace image block are located at the 
positions (p,q) for p=l,2,4,6,8 and q=l,2,4,6,8. The lowest four frequency components are at the positions (1,1), 
(1,2), (2,1) and (2,2). Hence, we set yi new =[ 'y n new y 12 new y 2 i new y 2 2 new f- As x 1 new =x 1 0ld =[a b c d] J and y 1 new =S 1 - 1 x^. It 
can be shown that y n new =2*(a+b+c+d), y 12 new =1.4419*(a-b+c-d), y 21 new =1.4419*(a+b-c-d) and 

y 22 new =1.0396*(a-b-c+d). By doing the IDCT, that is, X new =F T Y new F, where F is the DCT matrix, and expanding 
this IDCT equation, it can be shown that the pixel values in the reconstructed image block is 









40 



Wing-kuen Ling and Peter Kwong-shun T am 



for 

k=l,2, ... ,8 and m=l,2, ..*,<§* This equation can be expressed in the form of X nav =fl*A+b*B+c*C+<f*D, where A, B, 
C and D are constant matrices and image independent and are shown in figure 2 diagrammatic ally* These four 
matrices can be viewed as the interpolation matrices* Since these interpolation matrices are smooth, so the 
reconstructed image block is also smooth and the blocking artifacts are reduced* 

Introplation matrix A Introplation matrix B 
Ihtroptation matrix A IntropEation matrix B 




4.2 Slant Transform (ST) 

Similarly, it can be shown that the transformed coefficients for the ideal terrace image block are non-zero 
only at the positions (p t q) for p=I t 2,6 and q=I t 2,6 * Hence, we can set y i new = [y ; y 12 ew y 21 w y ?/™ : ] T and set die 
remaining sixty coefficients to zero (y 2 new =0), and use the same method as before to find X new * It can be shown that 
yn »™=2*(a+b+c+d), y 12 ™ w = 2.3093 *(a-b+c-d), y 21 n ™=1.3093 *(a+b-c-d) and y 22 mw =6/7*(a-b-c+d )* The 
reconstructed image block (X new ) is now interpolated by four new constant matrices A’, B’, C’ and D’ as shown in 
figure 3* 



Reduction of Blocking Artifacts in Both Spatial Domain and Transformed Domain 



41 



Introplation matrix A Introplation matrix B 
Introplation matrix A introplation matrix B 





Introplation matrix C Introplation matrix D 





Fig. 3. Interpolation matrices for Slant de-blocking technique 



5 Simulation Results 



The DCT de -blocking technique and the Slant de -bio eking technique are applied to the JPEG -coded images 
‘Tiffany”, “Cancer” and “Woman” of size 512x512 adaptively* The effectiveness of the proposed algorithm is 
estimated by both quantitative measurement and qualitative evaluation* 

For the quantitative measurement, the blocking artifact is mainly due to the grid noise in the monotone areas* 
The intensity of the monotone areas of a natural image usually changes very slowly, but there is a tendency for the 
intensity in the block-based coded image to change abruptly from one block to another as modeled in the section III* 
Therefore, we propose the following methodology to measure the quantitative result: 

If the four neighbor 8x8 image blocks are all DC blocks, that is, all the pixel values in the individual block 
are constant, then we sum up the error square in these four blocks, and compute the mean square error (MSE) of all 
these blocks as follows: 

MSE = r . [*(f j)- 0(u])\ ’ (5) 

where O is the original image, R is the reconstructed image, Q is the region where there are four neighbor 8x8 DC 
blocks and Nis the total number of pixels in Q* 

Table 1 shows the comparison of the results of applying some common existing methods and our proposed 
de-blocking techniques* It can be seen from table 1 that our proposed algorithm gives better quantitative results 
than that of the existing methods* The qualitative results shown in the figure 4 also show that our proposed 
algorithm gives a better image quality than that of the existing methods* 



42 



Wing-kuen Ling and Peter Kwong-shun Tam 



Original Image JPEG-coded Image 




Fig. 4. Simulation results of the comparison of the existing methods and our proposed algorithms 





Tiffany(0.238bpp) 


Cancer(0.139bpp) 


Woman(0.223bpp) 


JPEG coded image 


32.6421 


22.6819 


24.8619 


DCT zero-masking technique [5] 


30.4607 


19.2036 


21.6293 


DCT coefficient weighting technique [51 


29.4858 


19.0579 


20.9516 


Bi-domain DCT de-blocking technique 


26.9824 


16.7301 


19.908 


Bi-domain Slant de-blocking technique 


27.0419 


16.4105 


20.5486 



Table 1. Simulation results of calculated MSE by applying some common existing methods and our proposed algorithms 



6 Concluding Remarks 



In this paper, we propose a bi-domain de-blocking technique, which samples some pixel values in the shifted 
image block and discards some high frequency components in the corresponding transformed block. By solving 
for the remaining unknown pixel values and the transformed coefficients, we obtain a less blocky image. 

The proposed algorithm can be applied to the enhancement of very high compression ratio block-based 
coded images. The given image can be first compressed to a very high compression ratio image through the 






Reduction of Blocking Artifacts in Both Spatial Domain and Transformed Domain 43 

block-based coder, and then the blocky image is enhanced by the proposed algorithm. The simulation results using 
the Discrete Cosine Transform and the Slant Transform show that the blocking artifacts are reduced significantly. 

Further research work will study the effect on the image quality of the number of sampling points and the 
number of discarded transformed coefficients. The positions of the sampling points and the transformed 
coefficients will also be considered. 

Acknowledgement 

The work described in this paper was substantially supported by a grant from the Hong Kong Polytechnic 
University with account number G-V968. 

References 

1. Zakhor A.: Iterative procedures for reduction of blocking effects in transform image coding. IEEE Transactions 

on Circuits and System for Video Technology, Vol. 2, No. 1. (1992) 91-95. 

2. Malvar H. S. and Staelin D. H.: The LOT: Transform coding without blocking effects. IEEE Transactions on 

Acoustics, Speech, and Signal Processing, Vol. 37, No. 4. (1989) 553-559. 

3. Reeve H. C. and Lim J. S.: Reduction of blocking effects in image coding. Optical Engineering, Vol. 23, No. 1. 

(1984) 34-37. 

4. Lee Y. L., Kim H. C. and Park H. W.: Blocking effect reduction of JPEG images by signal adaptive filtering. 

IEEE Transactions on Image Processing, Vol. 7, No. 2. (1998) 229-234. 

5. Ling W. K. and Zeng B.: A novel method for blocking effect reduction in DCT-coded images. Proceedings of 

the 1999 IEEE International Symposium on Circuit and System (ISCAS), Vol. 4. (1999) 46-49. 




Simple and Fast Subband De-blocking Technique by Discarding the 
High Band Signals 



Wing-kuen Ling and P. K. S. Tam 

Department of Electronic and Information Engineering 
The Hong Kong Polytechnic University 
Hung Horn, Kowloon, Hong Kong 
Hong Kong Special Administrative Region, China 
Tel: (852) 2766-6238, Fax: (852) 2362-8439 
Email: bingo@encserver.eie.polyu.edu.hk 

Abstract. In this paper, we propose a simple and fast post-processing de-blocking technique to reduce 
blocking artifacts. The block-based coded image is first decomposed into several subbands. Only the low 
frequency subband signals are retained and the high frequency subband signals are discarded. The 
remaining subband signals are then reconstructed to obtain a less blocky image. The ideas are demonstrated 
by a cosine filter bank and a modulated sine filter bank. The simulation result shows that the proposed 
algorithm is effective in the reduction of blocking artifacts. 



1 Introduction 

Transform codecs, such as those based on the Discrete Cosine Transform (DCT), are simple codecs widely applied in the 
industry. However, they usually produce undesirable blocking artifacts at high compression ratios. This is because each block in 
an image is transformed independently, and the correlation among adjacent blocks is not exploited. Thus, at a high compression 
ratio, quantization errors lead to blocking artifacts. 

In order to tackle this problem, the lapped transform before encoding was proposed to capture the correlation information 
among the adjacent blocks [8]. However, this pre-processing technique requires a decrease of compression ratio and so it is not 
adopted in the international standard. Some subband de-blocking techniques [2, 3, 4] have also been proposed, but they are too 
complex in terms of implementation and computation. 

In this paper, we propose a simple and fast post-processing subband de-blocking technique, which discards some high band 
signals and retains the remaining low band signals. The algorithm is tested by the cosine filter banks and the modulated sine filter 
banks. The simulation results show that this algorithm can suppress the blocking artifact effectively in both the quantitative 
measurement and the qualitative evaluation. 

2 De-blocking System 

Since a block-based transform and a lapped transform can be viewed as a discrete time linear time periodic varying system, 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 44-48, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




Simple and Fast Subband De-blocking Technique 



45 



it can be realized by a filter bank structure [1 ]. Due to the fact that block edges always contain high frequency components [5], we 
propose to retain the low frequency band signals and discard the high band frequency signals. 

The more low band signals are retained, the more block boundaries will be captured in the reconstructed image. However, 
the image details will be destroyed if we only keep a very little subset of the subband signals. We have conducted an intensive 
simulation and found that the best performance corresponds to retain two subband signals and discard the remaining high band 
signals. 

The block diagram of the subband de-blocking system is shown in figure 1 . There are many ways to select the analysis 
filters, hj[n ] , for j=0,l, ..,M, and the synthesis filters, fj[n] , for j=0,l, ..,M, where the quantizers are designed as Q/x)=x, for 
j=0, 1, and Qj(x) = 0 , for j=2, 3, ,.,M. The design of the filters should give a perfect reconstruction system when the quantizers are 
removed. This is because the error introduced due to the filter bank structure is illuminated in the perfect reconstruction system. 
In this paper, a cosine filter bank [6] and a modulated sine filter bank [7] are selected to demonstrate this idea. 




Fig. 1. Block diagram of subband de-blocking technique 



2.1 Cosine Filter Bank 

The impulse responses of the synthesis filters, fj[n] , for j=0,l, 7, are the transform basis functions of the DCT and the 

impulse responses of the analysis filters, hj[n] , for j=0,l, 7, are equal to the time-reversed basis functions [6] as follows: 

' & 16 # 

for j=0,l,. .,7 and for n=0,l,. .,7, where: 

■A= ;j* o, 

+ — ; otherwise, 

+ 2 

& 16 # 

for j=0, l,. . r 7 and for n=0, 1, . . ,7 . 

2.2 Modulated Sine Filter Bank 

The modulated sine filter bank is similar to the cosine filter bank except that the impulse responses of the synthesis filters, 
fj[n] , for j=0,l, .., 7, are the transform basis functions of the modulated sine transform and the impulse responses of the analysis 
filters, hj[n], for j=0,l, .., 7, are equal to its time-reversed basis functions [7] as follows: 









46 



Wing-kuen Ling and Peter Kwong-shun Tam 



for j=0,l,..,7 and for n=0,l ,7, 



fj 



* — )sin 
2 



2#i 






( 4 ) 



for j=0,l,..,7 and for n=0,l,.., 7. 



3 Simulation Results 



The proposed de-blocking technique is applied to the JPEG-coded image “Cancer” of size 512x512 adaptively. The 
effectiveness of the proposed algorithm can be estimated by both the quantitative measurement and the qualitative evaluation. 

For the quantitative measurement, the blocking artifact is mainly due to the grid noise in the monotone areas. Since the 
intensity of the monotone areas of most natural image change very slowly, but there is a tendency for the intensity in the block- 
based coded image to change abruptly from one block to another, we propose the following methodology to measure this effect: 
If the four neighbor 8x8 image blocks are all DC blocks, that is, all the pixel values in the individual blocks are constant, 
then we sum up the error square in these four blocks, and finally we compute the mean square error ( MSE ) of all these blocks as 
follows: 



MSE*—) 2 /?/,/!/ O i, j ! 2 ’ 

N ij\3Q 

where O is the original image, R is the reconstructed image, Q is the region where there are four neighbor 8x8 DC blocks and N 
is the total number of pixels in Q. 

Table 1 shows the comparison of the results of applying existing methods and our proposed de-blocking technique. It can 
be seen from table 1 that our proposed algorithm gives better quantitative results than that of the existing methods. The 
qualitative results shown in figure 2 also demonstrates that our proposed algorithm gives a better image quality than that of the 
existing methods. 





Cancer(0.139bpp) 


JPEG coded image 


22.6819 


DCT zero-masking technique T51 


19.2036 


DCT coefficient weighting technique [51 


19.0579 


Cosine de-blocking technique 


17.9483 


Modulated sine de-blocking technique 


18.4175 



Table 1. Simulation results calculated by MSE of applying existing methods and our proposed algorithms 



4 Concluding Remarks 

In this paper, we have proposed a simple and fast post-processing subband de-blocking technique, which discards the high 
band signals and only retains the lowest two low band signals. This algorithm is tested by a cosine filter bank and a modulated 
sine filter bank. The simulation results show that our proposed method is very effective. 

Since it adopts the existing transform codec and do not affect the compression ratio, the proposed algorithm can be applied 
to the enhancement of very high compression ratio block-based coded images. The given image can be first compressed to a very 
high compression ratio image through the block-based coder, and then the blocky image is enhanced by the proposed algorithm. 

Further research work will focus on the finding of the best filter bank that gives the highest coding gain. 





Simple and Fast Subband De-blocking Technique 



47 




Fig. 2. Simulation results of the comparison of the existing methods and our proposed algorithms 



Acknowledgement 



The work described in this paper was substantially supported by a grant from the Hong Kong Polytechnic University with 
account number G-V968. 



References 



1. Malvar H. S.: Extended Lapped Transforms: Properties, Applications, and Fast Algorithms. IEEE Transactions on Signal 

Processing, Vol. 40, No. IT. (1992) 2703-2714. 

2. Sung W. H., Chan Y. H. andSiu W. C.: Subband Adaptive Regularization Method for Removing Blocking Effect, in Proc. ICIP, 

Vol. 2. (1995) 523-526. 

3. Rabiee H. R. andKashyap R. L.: Image De-blocking with Wavelet-Based Multiresolution Analysis and Spatially Variant OS 

Filters, in Proc. ICIP, Vol. 1. (1997) 318-321. 

4. Hsung T. C., Lun P. K. and Siu W. C.: A Deblocking Technique for JPEG Decoded Image Using Wavelet Transform Modulus 

Maxima Representation, in Proc. ICIP, Vol. 1. (1996) 561-564. 

5. Ling W. K. andZeng B.: A Novel Method for Blocking Effect Reduction in DCT-Coded Images, in Proc. ISCAS, Vol. 4. (1999) 

46-49. 

6. Malvar H. S.: Lapped Transforms for Efficient Transform/ Subband Coding. IEEE Transactions on Acoustics, Speech, and 




48 



Wing-kuen Ling and Peter Kwong-shun Tam 



Signal Processing, Vol. 38, No. 6. (1990) 969-978. 

7. Malvar H. S.: Efficient Signal Coding with Hierarchical Lapped Transforms, in Proc. ICASSP, Vol. 3. (1990) 1519-1522. 

8. Malvar H. S. and Staelin D. H.: The LOT: Transform coding without blocking effects. IEEE Transactions on Acoustics, 

Speech, and Signal processing, Vol. 37. (1989) 553-559. 




A Method with Scattered Data Spline and 
Wavelets for Image Compression* 



Guan Liitai and Lu Feng 

Department of Scientific Computing and Computer Applications 
Zhongshan University, Guangzhou 510275, P. R. China 



Abstract. This paper presents a method for image compression. First, 
selecting some scattered data points on some lines of a plane to construct 
an interpolating spline surface approach to the image, then, one kind of 
wavelets for this spline function is given. By different codes to spline and 
wavelets, an image compression finished. 



1 Introduction 

In [1], we discussed spline- wavelets of plane scattered data for data compression. 
The basic idea is using spline interpolation first, then, by spline- wavelets for data 
compression. 

From [3]- [7], some different multivariate spline interpolation for scattered 
data were given. Think of image data be in some lines, we can simplify the local 
support multivariate splines to scattered data in Hilbert space of [1] in this case. 

In this paper, a method for image compression is presented. To select some 
scattered data points on some lines to construct an interpolating spline approach 
surface for the image first, then to give a spline- wavelets decomposition for the 
spline function using different codes to the spline and wavelets to finish image 
compression. 

2 Polynomial Natural Spline Local Basis Interpolation 
for Large Scattered Data on Some Lines 

Problem I: Given scattered data points on some lines of a plane (xi,yij),i = 
1,2, .. .no; j = 1, 2, . . . and real numbers Zij,i = 1, 2, . . .no; j = 1, 2, . . . m*, 
find a function f(x,y) € H mn (R ) p| D z satisfying 

Ji(f)= min Ji(u) 

u H rnn (R) P| d z 

where : H™(R) = e L,(R), 

* This work is supported by Natural Science Foundation of Guangdong(9902275), 
Foundation of Zhongshan University Advanced Research Centre. 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 49 53, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



50 



Guan Liitai and Lu Feng 



is an absolutely continuous function, 

a = 0, . . . , m — 1, (3 = 0, . . . , n — 1; (x, y) G R = [a, b\ x [c, d]}, 
D z = { u(x,y)\u(xi,yij ) = Zij,i = l,...n 0 ,j = 1 
Let: «(”>■") (x,y) = , 

n( 8 )l 



Ji(u)= (u^ n \x,y)) 2 dxdy + / 

_R 2 "- u 

m01 

+ / (u^’ n ^ (a, y)) 2 dy 

Jc n 



/i=0 

We call the solution of this problem I natural interpolation spline function 
for scattered data on some lines. 



Theorem 1 A natural interpolation spline function for scattered data on some 
lines f(x,y) has the following explicit and closed-form expression: 

no mi m<S> 1 n<S>l 

f(x,y) = ^2'^2a ij G 1 (x,x i )G 2 (y,yi,j) + ^ E 

1=1 j= 1 i=0 j = 0 



where: 






(2m- 1)! 



/x=0 



/i! (2m — /i — 1)! 






( 






Let 



(2n — 1)! 






u=0 



A 1 (2n — i/ — 1)! 



1 ••• ; eofy/.y/,/) 

1 Hi, j+i ■■■ Vijtl G 2 (y,yi,j+ 1 ) 



1 Vi,j+2n - ‘ - Ui,j+2n G 2 (y, Hi.j+'ln ) 

if j + 2n > TOi, then to 0 < £1 < £ 2 < • • • < £ 2 „, let 2/i, mj +i = 2/», mi + 

£lj ‘ ‘ ' 5 Vi,mi-\-2n = Vi, mi n- 

We can prove that Bij(y ) is B-spline with knots 2/ij, yi,j+u • • • , yij+ 2 n 

Theorem 2 GL natural interpolation spline function for scattered data on some 
lines f(x,y) has the following local basis explicit and closed-form expression: 



no mi m®l n®l 

f(x,y) = EE aijGi{x,Xi)Bij{y) + EEw 

*=1 J = 1 *=0 j = 0 







A Method with Scattered Data Spline and Wavelets 



51 



Theorem 3 The coefficients and 

of a natural interpolation spline function for scattered data on some lines f(x, y ) 
can be solved by the following linear system : 



' A B 


~A 




~ z~ 


B t 0 


0 




0 



Z is a given real number set {z%j\G = 1, • • • , no; j = 1, • • • , mi,A is a unknown 
coefficient vector (aij ) i= — y= — andC = (cij)*=o^®T;j=o^®T Elements of 
matrix B are Vj j ^ ^=0, ^^ rn®l;i'=0, ^pn ®l;i=l, ^n o : j = l, ^^r rii i 

Elements of matrix A are off = Gi(x a ,Xi)Bij(y a ^),i, oi = — l;/3 = 

1, • • • , m a -,j = 1, • • • , m*. and 0 is a zero matrix. 



3 Algorithm for Polynomial Natural Interpolation 
Splines on Some Lines 



An algorithm for an interpolating spline surface approach to the image on some 
lines is given as follows: 

1) Select suitable points on some lines: (xi,yij),i = 1, • • • , no; j = 1, • • • , ra^. To 
an image with k rows and l columns ( k < /), we use the even rows as the lines. 
To every line, we find the image points with suddenly changing color as our 
suitable points, then adding one to two points in the two image points that 
there exist similar colors in. 

2) To n = 1 or m = 1, n = 2 or m = n M 2, compute non-zero numbers of 
matrix A and matrix B. 

when m = 1, Gi(x a ,Xi) = ( Xi — Xa)+ + a — Xi — 1; 

when m = 2, G±(x a , Xi) = _|_ (a®xi) ( x ^ a + xf\ 

Bij is a B-spline,when n = l,it is a piece- wise polynomial with one de- 
gree; when n = 2, it is a piece- wise polynomial with three degree. We can 
use the following formula: 



BiM = B^y) 



V VH B n®i {y) + 
yi,j+n Vi,j 



yj,j+n + 1 y 

hij+n+l yi,j + 1 



pn0 1 

n ij + 1 



(y) 



nO ( \ _ f 1 y G [VijiViJ+l] 

ij \y) | q otherwise 

3) Using gerneralized conjugate gradient acceleration of iteration method to find 
the solution of the spline interpolation problem. (Theorem 3) 



4 Wavelets for the Polynomial Natural Interpolation 
Splines on Some Lines 

Just like one variate case [2] and two variate cases [l],we can define a multiresolu- 
tion analysis to the polynomial natural interpolation spline on some lines. Then 




52 



Guan Liitai and Lu Feng 



a theorem to the polynomial natural interpolation spline basis on some lines 
being the basis of scale function space is given, and its dimension is discussed. 

Note 62 ™, 2 n be a spline space with natural spline local basis on some lines 
of( 2 ra, 2 n) order, 2n be a subspace of 62 ™, 2 n with zero condition on the 
refinement points on some lines. A similar theorem of (m, n) order differential 
operator D m,n operating A 2m 2 n or h° the wavelet space for local basis polynomial 
natural spline on some lines is obtained. Then using Lagrange interpolation 
method, a wavelet basis of this wavelet space is constructed. 

5 Image Compression Algorithm 

By local basis polynomial natural splines on some lines and wavelets for this local 
basis polynomial natural splines method, an algorithm for image compression is 
shown as follows: 

1) Using the algorithm in 3, selecting suitable points on some lines constructing 
local basis polynomial natural spline interpolating on some lines to approach 
this image data. 

2) To threat these data by wavelet decomposition, these wavelets are wavelets 
for the local basis polynomial natural spline on some lines in 4. 

3) Using different coding method to compress these data. 

6 Conclusion 

Notice the local support properties of B-spline,the coefficient matrix is sparse 
matrix, and the zero elements in the matrix are more and more than it in [ 1 ]. 

Acknowledgements 

This work is supported by the Foundation of Zhongshan University Advanced 
Research Centre. This work is supported by Natural Science Foundation of 
Guangdong (9902275). 

References 

1. Liitai Guan, Spline- wavelet of plane scattered data for data compression , in 
”ICMI’99 Proceedings”, Hongkong Baptist University (1999), VI-127- VI-131 
(ISBN 962-85415-2-8). 

2. Liitai Guan, Spline-wavelets of free knots for signal processing, in ’’Proceeding of 
ICSP’96”, eds. Yuan Baozhong & Tang Xiaofang, IEEE Press, (1996) 311-314. 

3. Chui, C. K. and L. T. Guan, Multivariate polynomial natural splines for interpo- 
lation of scattered data and other applications, in ’’Workshop on Computational 
Geometry” eds. A. Conte et al, World Scientific (1993) 77-96. 




A Method with Scattered Data Spline and Wavelets 



53 



4. Liitai Guan, Bivarate polynomial natural spline for smoothing or generalized in- 
terpolation of the scattered data, Chinese J. of Num. Math & Appl. 16:1 (1994) 
1-14. 

5. Dong Hong, Recent progress on multivariate spline, in ” Approximation Theory: 
in memory of A. K.Varme” eds. N. K.Govil, Marcel Dekker Inc. N. Y., 265-291 
(1998). 




A Wavelet-Based Preprocessing for Moving Object 
Segmentation in Video Sequences 



Li-Chang Liu 1 , Jong-Chih Chien 1 , Henry Y. Chuang 2 , and Ching-Chung Li 1 

1 Department of Electrical Engineering, University of Pittsburgh 
Pittsburgh, PA 15261, USA 
{ lilst4 , j ocst4 }@pitt . edu, 
ccl@ee .pitt . edu 

2 Department of Computer Science, University of Pittsburgh, 
Pittsburgh, PA 15261, USA 
chuang@cs . pitt . edu 



Abstract. A simple preprocessing method for extracting boundary re- 
gions of moving objects in a video sequence is presented. We use 
Chui s overssampled shift-invariant wavelet transform and the multire- 
solution motion estimation and compensation in the wavelet domain. 
Dominant prediction errors often appear along the boundary of a mov- 
ing object. Our algorithm is developed to detect boundary regions at a 
coarse scale by utilizing the prediction error information provided in all 
subband images at the coarse resolution. This is taken as our first step 
toward the video object segmentation for use in the wavelet-based 
MPEG-4. 



1 Introduction 

Object-based video coding requires the initial segmentation of objects in video 
frames. Video objects can be characterized by their shape, texture and motion; the 
problem of automatic segmentation of objects in motion is a complex and tedious task 
[1,2, 3, 6, 7, 8, 9]. In this paper, we present a wavelet-based preprocessing method for 
coarse extraction of moving object boundaries within the setting of multiresolution 
block-based motion estimation and motion compensation. 

The recent work of Al-Mohimeed and Li [4] investigated the motion estimation and 
compensation in the wavelet domain for video compression based on the Chui-Shi- 
Chan oversampled frame shift-invariant wavelet transform [5]. Using a minimal over- 
sampling rate of 3, a 1 -dimensional signal in interpolated, by a spline interpolation, 
with two middle points between each pair of successive pixels to give 3 channels of 
data for wavelet-decomposition. At each resolution level, these three channels of 
decomposition are appropriately combined, resulting in an almost shift-invariant 
wavelet-transform as shown in Fig. 1. With the same interpolation along both hori- 
zontal scans and vertical scans and using the tensor product of these 1 -dimensional 
transforms, one obtains an almost shift-invariant 2-dimensional wavelet transform of 

Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 54-64, 2001. 

Springer- Verlag Berlin Heidelberg 2001 




A Wavelet-Based Preprocessing for Moving Object Segmentation in Video Sequences 55 



an image. Let subband images in the wavelet decomposition at a given resolution level 
be denoted by LL (scaling image) and LH, HL, and HH (wavelet images) respectively, 
and let the decomposition be carried out to J resolution levels (j=l, 2, ,J). Because 

of the shift-invariance, the block matching in each subband image in successive 
frames yields a reliable block motion estimation and a reduction of prediction errors, 
leading to an improved performance on video compression. The prediction errors in 
multiple subband images also readily provide boundary information of objects in 
motion. This leads to an efficient preprocessing method for moving object segmenta- 
tion as described in the following sections. 







Fig. 1 . The Chui-Shi-Chan Oversampled (rate=3) Wavelet Decomposition 



2 Boundary Regions of Moving Objects 

Let us consider a pair of matching blocks illustrated in Fig. 2(a), where the top block 
is the current frame and the bottom block is in the previous frame. If a block contains 
partially a moving object and partially the background, the object motion may bring 
in different background in the immediate vicinity of its boundary. Thus, the block 
matching will yield relatively large absolute value of the prediction error in the imme- 
diate neighbor outside the object boundary. Or else, if the block matching is improper 
as illustrated in Fig. 2(b), the prediction error along and near the boundary will have 
large absolute values. At a coarse resolution level, such regions will become more 
pronounced in one or more subband images, and will keep the object boundary of the 
original resolution within their interiors. This is the model based upon which our 
preprocessing algorithm is developed for extraction of moving object boundary. 


















56 Li-Chang Liu et al. 



At the resolution level j, let the four subband images LL, LH, HL and HH be in- 
dexed as i = 1, 2, 3 and 4, respectively. Consider first the prediction error in the scal- 
ing image LL (i=l) at the resolution level j. If the absolute value of the prediction 
error e at a point is greater than a chosen threshold Ei, that point is taken as a candi- 
date point in the boundary region at the resolution level j . All the candidates points in 
the subband image are processed for connected component labeling, resulting in a 
number of connected regions. With a chosen threshold for the size of each connected 
region (number of points per region), one detects Qi portions of the moving object 
boundaries in this subband image, designated as Rj i?q (i=l; q = 1,2, jQA A bound- 
ing rectangle can be constructed for each of these subregions. Similar processing is 
done for each of the wavelet images LH (i=2), HL (i=3) and HH (i=4), giving signifi- 
cant boundary subregions Rj ?iq (i=2,3,4) detected in these subband images. These 
boundary subregions will be merged with, and complement, those subregions detected 
in the scaling image. The merging process will be done as follows. Consider a larger 
rectangular neighborhood Nqi enclosing a bounding rectangle of Ry ;q detected in the 
scaling image. Any Rj ?iq (i=2,3,4; q=l,2, ,Q0, detected in the corresponding wavelet 

images, whose bounding rectangle intersects with any Nqi will be pooled in the union; 
those not intersecting with any Nqi will be eliminated. The union of those retained 
subregions are then processed by the morphological closing operations to give the 
extracted boundary regions of moving objects at the particular resolution level j. 





Fig. 2. Matching Blocks Containing an Object Boundary Produce Large Values of the Predic- 
tion Error in the Immediate Vicinity of the Moving Boundary: (a) Object Boundary Moves into 
Different Background; (b) Improper Matching 



3 Moving Boundary Region Detection Algorithm 

Our algorithm is described in the following steps: 

Step 1. Use the shift-invariant wavelet transform to decompose images in a video 
sequences to three resolution levels. Compute the block-matched motion estimation 





A Wavelet-Based Preprocessing for Moving Object Segmentation in Video Sequences 57 



(using full-search) and prediction errors in all scaling and wavelet images for three 
resolution levels. 

Step 2. At the coarsest resolution level (j=3), compute mean (mO and standard devia- 
tion ( ) of the absolute values of prediction errors in each subband image (i=l ,2,3,4) 
and determine the threshold values E* as (m^ +! iVfor detection of candidate points in 
the coarse boundary region. 

Step 3. Apply threshold Ei to detect all candidate points in the boundary regions of 
moving objects in each subband image at level 3. Label all connected candidate 
points to determine candidate regions of moving boundaries 

Step 4. Choose a threshold for the size of connected regions and determine the signifi- 
cant boundary subregions R NKq in that subband image and construct a bounding rec- 
tangle for each subregion. 

Step 5. Merging object boundary subregions detected in the wavelet images with those 
detected in the scaling image. Construct a larger rectangular neighborhood Nqi for 
each bounding rectangle obtained in the scaling image, and test intersection of any 
bounding rectangle obtained in the wavelet images with any rectangular neighborhood 
Nqi constructed in the scaling image. If intersected, keep the corresponding subregion 
Rj ?i?q ; if not, eliminate it. Merge all retained boundary subregions. 

Step 6. Perform the morphological closing operation to obtain boundary regions of 
moving objects at this particular resolution level. 



4 Experimental Results 

Experiments were performed using three video sequences: Clair [Fig. 3 and 4, image 
size 288x320], Table Tennis [Fig. 5, image size 224x352], and Salesman [Fig. 6, 
image size 233x320]. An almost shift-invariant wavelet transform using biorthogonal 
wavelet BIOR(2,2) and linear interpolation with an oversampling rate of 3, was used. 

The decomposition filter coefficients are given by h 3 \ {# 1 , 2, 6, 2, # 1 } and 

g3y{#l,2,#l}. Video frames were decomposed to the third resolution level. The 

blocksize used for motion vector estimation at the third level was either 4x4 or 2x2. 
BIOR(2,2) was used in the experiments because of its short support and symmetric 
property. 

Fig. 3(a) and 3(b) show two successive frames of the first clip from the Clair se- 
quence. Fig. 3(g) displays the absolute values of the prediction errors in the scaling 
image (LL) at the resolution level 3, where the locations of large values reflect the 
moving object boundaries at that resolution level. In this experiment, matching block- 
size of 4x4 was used. The prediction errors were thresholded and connected compo- 
nents labeled to yield one significant component of a moving boundary enclosed in a 
bounding rectangle which is overlaid in the scaling image as shown in Fig. 3(c). 
Similar processing was done on each of the LH, HL, HH wavelet images at the reso- 
lution level 3, the results are shown in Fig. 3(d), (e), and (f); these results were com- 




58 Li-Chang Liu et al. 



bined with the result from the scaling image, as indicated by four overlaid rectangles 
shown in Fig. 3(h). After a morphological closing operation on the composite of these 
detected subregions, we extracted the moving object boundary regions at the resolu- 
tion level 3, which are given by the bright regions around Clair s head as shown in 
Fig. 3(i). 

Fig. 4 shows the experimental results obtained on the second clip of the Clair se- 
quence obtained at a different time. Fig. 4(d) displays the extracted object boundary 
regions when the matching block size 4x4 was used, and Fig. 4(f) displays the result 
obtained when the matching blocksize was 2x2; the latter gave better boundary ex- 
tractions. Note that the blocksize 2x2 at level 3, although small, corresponds to the 
blocksize 16x16 at the original level (in our case, with interpolation), which is widely 
used in the standard block matching. Fig. 5 shows the experimental results obtained 
on the Table Tennis sequence that contains two separate moving objects. Fig. 5(d) 
gives the extracted moving boundaries when 4x4 matching blocksize was used, while 
Fig. 5(f) gives the result when 2x2 matching blocksize was used. Again, the 2x2 
blocksize appeared to yield a slightly better result. A different video sequence, 
Salesman, was also used for the experiment, where the matching blocksize of 4x4 was 
tested. In this case, the connected components of large prediction errors in the scaling 
image at the level 3 yielded three significant components of moving boundaries, and 
the experimental result is shown in Fig. 6. 



5 Conclusion 

We have presented a simple wavelet-based preprocessing method in extraction of 
boundary regions of moving objects in a video sequence. It operates on prediction 
errors associated with the block-matching motion estimations in the scaling and 
wavelet images at a coarse resolution level, thus a fast computation can be attained. 
The preprocessing yields coarse boundary regions of video objects, based on which a 
refined extraction can be developed. This brings us one step closer to the automatic 
moving object segmentation for use in MPEG-4. Two different sizes of the matching 
block have been experimented, the smaller blocksize appears to give better result on 
boundary extraction. 



Acknowledgment 

This work is supported in part by a grant from the Pittsburgh Digital Greenhouse in 
collaboration with OKI Semiconductor. 




A Wavelet-Based Preprocessing for Moving Object Segmentation in Video Sequences 59 



References 

1. T. Meier and King N. Ngan, Automatic Segmentation of Moving Objects for 
Video Object Plane Generation , IEEE Trans, on Circuits and Systems for Video 
Technology, Vol 8, No. 5, pp. 525-538, September, 1998. 

2. D. Wang, Unsupervised Video Segmentation Based on Watersheds and Tempo- 
ral Tracking , IEEE Trans, on Circuits and Systems for Video Technology, Vol. 
8, No. 5, pp. 539-546, September, 1998. 

3. M.R. Razaee, P. M.J. van der Zwet, B. P. F. Lelieveldt, R. J. van der Geest, and J. 
H. C. Reiber, A Multiresolution Image Segmentation Technique based on Py- 
ramidal Segmentation and Fuzzy Clustering , IEEE Trans, on Image Processing, 
Vol. 9, No. 7, pp. 1238-1248, July 2000. 

4. M.A. Al-Mohimeed, and C. C. Li Motion estimation and compensation based 
on almost shift-invariant wavelet transform for image sequence coding , Interna- 
tional Journal of Imaging Systems and Technology, Vol. 9, No. 4, pp. 214-229, 

1998. 

5. C. Chui, X. Shi, and A. Chan An oversampled frame algorithm for real-time 
implementation and applications Proc. SPIE Conf. On Wavelet Applications , 
Orland, FL, April 1994, Vol. 2242, pp 272-301. 

6. L. Zheng, J. C. Liu, A.K. Chan, W. Smith Object-Based Image Segmentation 
Using DWT/RDWT Multiresolution Markov Random Field Proc. IEEE Inter- 
national Conf. On Acoustics, Speech, and Signal Processing, Phoenix, A Z, March 

1999. Vol. 6, pp. 3485-3488. 

7. I. Kompatsiaris, and M. G. Strintzis Spatiotemporal Segmentation and Tracking 
of Objects for Visualization of Videoconference Image Sequences IEEE Trans. 
On Circuits and Systems for Video Technology, Vol. 10, pp. 1388-1402, Dec, 

2000. 

8. I. Koprinska, S. Carrato Temporal video Segmentation: A Survey Signal 
Processing: Image Communication, vol. 16, pp. 477-500, 2001 

9. M. Bagci, I. Yilmaz, M.H. Karci, T. Kolcak, U. Orguner, Y. Yardimci, M. 
Demirekler, and A.E. Cetin Moving Object Detection and Tracking in Video 
Based on Higher Order Statistics and Kalman Filtering Proc. (CDROM) 2001 
IEEE-EURASIP workshop on Nonlinear Signal and Image Processing, Balti- 
more, MD, June 2001. 




60 Li-Chang Liu et al. 






(e) (f> 



Fig. 3. Experimental results on clip 1 of the Clair sequence ( image size 288x320 ): (a) previous 
frame; (b) current frame; (c) a bounding rectangle of the moving boundary subregion obtained 
in level 3 scaling image LL; (d),(e),(f) boundary bounding rectangles obtained in level 3 wave- 
let images LH, HL, and HH respectively 







00 



(i» 



Fig. 3. ( continued) (g) absolute prediction errors in level 3 LL image; (h) a composite 
of bounding rectangles of boundary subregions detected from all scaling and wavelet 
images at level 3; (i) extracted coarse boundary regions of the moving objects 
( matching blocksize is 4x4 ) 





62 Li-Chang Liu et al. 




(e) (0 



Fig. 4. Experimental results on clip 2 of the Clair sequence ( image size 288x320 ): (a) 
previous frame, (b) current frame, (c) a composite of bounding rectangles of boundary 
subregions detected from all scaling and wavelet images at level 3, (; matching block- 
size : 4x4), (d) extracted coarse boundary regions of the moving object (matching 
blocksize: 4x4), (e) a composite of bounding rectangles of boundary subregions de- 
tected from all scaling and wavelet images at level 3 ( matching blocksize: 2x2), (f) 
extracted coarse boundary regions of the moving object ( matching blocksize: 2x2) 








A Wavelet-Based Preprocessing for Moving Object Segmentation in Video Sequences 63 




Fig. 5. Experimental results on Table Tennis sequence {image size 224x352 ): (a) previous 
frame, (b) current frame, (c) a composite of bounding rectangles of boundary subregions de- 
tected from all scaling and wavelet images at level 3, {matching blocksize : 4x4), (d) extracted 
coarse boundary regions of the moving objects {matching blocksize: 4x4), (e) a composite of 
bounding rectangles of boundary subregions detected from all scaling and wavelet images at 
level 3 {matching blocksize: 2x2), (f) extracted coarse boundary regions of the moving objects 
{matching blocksize: 2x2) 





64 Li-Chang Liu et al. 




(C) (d) 



Fig. 6. Experimental results on Salesman sequence ( image size 288x320 ): (a) previous frame, 
(b) current frame, (c) a composite of bounding rectangles of boundary subregions detected from 
all scaling and wavelet images at level 3, (d) extracted coarse boundary regions of moving 
objects ( matching blocksize is 4x4) 




Embedded Zerotree Wavelet Coding of Image 

Sequence 



Mbainaibeye Jerome and Noureddine Ellouze 



Laboratoire de Systeme et Traitement du Signal (LSTS) 
Ecole Nationale d’Ingenieurs de Tunis 
BP 37, Tunis le Belvedere 1002, Tel :874 700 
Jerome. mbai@ enit.ru. tn 
N.Ellouze@enit.rnu.tn 



Abstract. In this paper we present an image sequence coding system based on 
Embedded Zerotree Wavelet algorithm (EZW). Difference between the image 
in the coder and the reconstructed previous image in the decoder is used as 
technique for removing the temporal redundancies. The first image is encoded 
in intra-mode by EZW algorithm and a specific binary codebook CB1. The 
subsequent images in the sequence are encoded by performing the difference 
between the reconstructed previous image in the decoder and the current image 
in the coder; this difference (residual image) is then encoded by EZW 
algorithm and a specific binary codebook CB2. Simulations are operated on 
Claire and Alexis sequences. The results show that the system can provides 
best reconstruction quality as well objectively as subjectively for a minimum 
given bit rate. Progressive transmission, rate control for constant bit-rate and 
rate scalability are the main characteristics of this system. 



1. Introduction 

In multimedia applications, digital image compression is generally used for storage 
and transmission. MPEG1, MPEG2 and H263 are standards used in moving images 
coding. MPEG2 uses DCT applied in blocks of 8 x 8 pixels where motion estimation 
and compensation are performed. H263 uses also DCT for low bit rate. Due to the fact 
that image is split in blocks then the above standards produce image quality affected 
by block effects at low bit rate. Shapiro proposed Embedded Zerotree Wavelet 
algorithm (EZW) for image compression [1], which uses dependencies among 
wavelet subbands [2] -[6]. This coder outperforms today JPEG standard, ranging from 
low bit-rate to high bit-rate. Since, many developments in image compression using 
wavelet transform are performed [7]-[16]; improvements have been obtained by 
modification of EZW [10, 12, 16]. JPEG2000 is a new standard based on wavelet 
transform [17]. B.J.Kim and al. extended the Set Partitioning in Hierarchical Three 
(SPIHT) for video sequences [18] which exploits the energy clustering property of 3D 
subband/wavelet coefficients. Despite of the realization of MPEG4 and MPEG7 
standards, the adoption of wavelet to video coding constitutes a special challenge. 
One can apply 2D wavelet coding in combination with motion compensation to 
temporal prediction, or one can consider the sequence as a three-dimensional array of 
data and making compression with 3D- wavelet analysis. These approaches present 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 65-75, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




66 



Mbainaibeye Jerome and Noureddine Ellouze 



some difficulties that arise from the fundamental property of discrete wavelet 
transform which is space- varying operator. 

In this paper, we present an image sequence coding system based on EZW algorithm. 
Image sequences are characterized by great similarities between consecutive images 
(the term image in this paper is used to design frame). These similarities are known as 
temporal redundancies. The removing of these temporal redundancies is the key 
technique, which improves the compression performances. In some standards such as 
MPEG 1-2 and H.263, this is performed by motion estimation and compensation 
where displaced blocks are searched and encoded to predict the current image from 
the previous one. In our approach, the temporal redundancies removing process is 
operated by calculating the difference between the consecutive images in the 
sequence. Discrete wavelet transform is applied on the residual images. EZW 
algorithm is used for the encoding process. This paper is organized as follows: in 
section 2, we present a short description of EZW algorithm; section 3 presents the 
proposed image sequence coding system. Results and discussions are presented in 
section 4; the conclusion is finally presented in section 5. 



2. Embedded Zerotree Wavelet Coding 

The EZW algorithm encodes images in embedded fashion from their dyadic wavelet 
representations. The goal of embedded coding is to generate a single encoded bit 
stream that allows achieving any desired bit rate while giving the best reconstruction 
quality at that rate. In wavelet domain, image is represented by approximation 
coefficients (called DC subband) and detail coefficients (called AC subbands). These 
coefficients are represented in trees. The trees are structured according to a rule such 
that a parent coefficient in AC subband is related to four children in the next finer AC 
subband for the same orientation and same spatial location. Only the parent 
coefficient in DC subband is related to three children, one in each of the three coarsest 
AC subbands. The EZW algorithm encodes these coefficients by using a sequence of 
thresholds. The initial value of the threshold T 0 is defined such that C 2 T 0 where 
C is the maximum wavelet coefficient. A coefficient X t is significant if X t T. 
Significance map consists of scanning the wavelet coefficient matrix to decide 
whether or not a coefficient is significant and it is generated in each bit plane. Two 
passes are performed for each threshold value: the dominant pass and the subordinate 
pass. All significant coefficients found in dominant pass are encoded by four symbols 
which are ZTR (Zerotree Root), IZ (Isolated Zero), POS (significant Positive) and 
NEG (significant Negative). ZTR symbol is generated for an insignificant coefficient, 
which has no significant children. IZ symbol is generated for an insignificant 
coefficient, which has at least one significant child. POS and NEG are generated for 
significant coefficients which are positive and negative respectively. In finer AC 
subbands where the coefficients have no children, IZ and ZTR are merged into Z 
(zero) symbol. The subordinate pass refines the quantized coefficients to obtain the 
best approximation of wavelet coefficients. 

EZW algorithm is particularly interesting for applications such as rate and quality 
scalabilities since encoder and decoder can terminate the encoding and decoding 
process at any time and gives a target rate or target distortion. 




Embedded Zerotree Wavelet Coding of Image Sequence 



67 



3. Coding of Image Sequences 

A general structure of image sequence coding system is composed by encoder and 
decoder (figure 1). We shortly describe this system, referring to MPEG2 where the 
orthogonal transform is the DCT and the entropy coding is the variable length coding 
(VLC). In the encoder, blocks of the first image in the sequence are encoded in intra- 
mode without any reference. In fact, DCT is applied in blocks of 8 x 8 pixels. The 
quantized coefficients are then encoded using VLC coding to produce the bit stream. 
The subsequent images are encoded by prediction from the previous images using 
motion estimation and compensation technique. The motion estimation process tries 
to detect the displaced blocks between the current image and the previous image. 
These blocks are then encoded to predict the current image. Of course, for constant bit 
rate applications, bit rate control algorithm is used to prevent the underflow or 
overflow. However, MPEG does not specify the way to search the displaced blocks; 
this is the detail that the system designer can choose to implement in one or many 
possible ways. This is also the case of bit rate control algorithm where complexity 
versus quality issues need not to be addressed relative to individual application. The 
decoder performs the inverse operations accomplished by the encoder. 



Bit rate control 




a 




b 

Fig. 1 . General structure of image sequence coding system: a) Encoder, b) Decoder 















68 



Mbainaibeye Jerome and Noureddine Ellouze 



3.1 Proposed System 

Figure 2 shows the structure of the proposed image sequence coding. Compared to 
figure 1, our system differs by the following considerations: 

- The image is not split in blocks; 

- Wavelet transform is applied on the whole image; 

-Temporal redundancy removing process is operated on the whole image and not on 
blocks; 

-The encoding is realized for limited channel or in the other words for a given level. 
The encoder contains three components: 

-The Discrete Wavelet Transform (DWT), which represents the image in the wavelet 
domain. 

-The Embedded Zerotree Wavelet Quantization (EZWQ) which quantizes the wavelet 
coefficients in embedded fashion and produces the EZW symbols; 

-The Binary coding which encodes the produced EZW symbols by a specific defined 
binary codes [19]. The decoder performs the inverse of the encoder’s operations. 




a 




b 

Fig. 2. Proposed coding system: a) Encoder, b) Decoder 



In [19], we have studied the probability distributions of the EZW symbols for 
standard images including Lena, Barbara, Mandrill, Goldhill and Peppers. In fact, 
these images are decomposed in wavelet domain using the Daubechies biorthogonal 
wavelet 9/7-tap filter bank [22]. Five scales are performed [20, 21]. EZW algorithm 
is used to generate the different symbols described in the section 2. The probability 
distributions of these symbols are estimated. From these distributions, we have 
defined binary codes for each symbol in each subband. A specific codebook which we 
called CB1 is built. Using CB1 in still image coding, the obtained results outperform 
the Flexible Zerotree Codec [21]. According to this performance, we have extended 
the probability distributions analysis of EZW symbols to image sequences. So, 












Embedded Zerotree Wavelet Coding of Image Sequence 



69 



differences between consecutive images in the image sequences are calculated. Some 
image sequences including Alexis, Claire, Mother & daughter, Salesman are used in 
the experimentation. These differences are then decomposed in wavelet domain. 
Similarly, EZW algorithm is used to generate the different symbols where their 
probability distributions are estimated. From these distributions, we have defined the 
binary codes for each symbol in each subband. A specific binary codebook for image 
difference, which we called CB2, is then built. Since the first image in the sequence is 
considered as the still image, it is encoded without any reference. The subsequent 
images are encoded by prediction. In our system, the first image is encoded by using 
CB1 and the subsequent images in the sequence are encoded by CB2. 



3.2 Encoding Protocol 

Two configurations are analysed in the terms of objective and subjective 
reconstruction qualities. For the first configuration, the following steps are performed: 

1. The first image in the sequence (designed by X 0 ) is decomposed in wavelet 
domain and encoded by using EZW algorithm and CB1. The produced bit stream is 
considered as a reference bit stream; 

2. The difference between the current image X n and the previous image X n 1 in the 
encoder frame memory is calculated to remove temporal redundancies. 

3. The obtained residual image D n is decomposed in wavelet domain and encoded by 
using EZW and CB2. The bit stream produced in this case is the residual bit stream; 

4. The current image is reconstructed by adding the residual image D n and the 
previous image x n j in the decoder. 

The following expressions summarize image difference calculating, and 
reconstruction process: 

D n \X n X nl (3.1) 

X n \X nl VD„ (3.2) 

The expression 3.1 provides the difference D n between the current image X n and the 
previous image X n x . The expression 3.2 gives the reconstruction of the current 

image X n from the residual image D n and the reconstructed previous image x n , . 
For the second configuration, only the step 2 is changed where difference is 
calculated between the current image and the reconstructed previous in the decoder. 
Expression 3.1 becomes: 

n 1 



D'.X„ X 



(3.3) 




70 



Mbainaibeye Jerome and Noureddine Ellouze 



4. Experimental Results and Discussions 

Simulations are operated on Claire and Alexis sequences. Decomposition is 
performed using 9/7 filter bank [27]. The image size is rescaled to 256 x 256 pixels 
before decomposition. To reproduce an image from the received binary symbols, the 
output bit stream includes seven bytes as header information: four bytes for horizontal 
and vertical dimensions of the image, one byte for the filter bank, one byte for the 
decomposition levels and one byte for the initial threshold. Since the horizontal and 
vertical dimensions, the filter bank, and the number of decomposition levels are the 
same for the residual image, only initial threshold can change; so, one byte header is 
included in the residual bit stream to inform the decoder to update the initial 
threshold. Figure 3 shows PSNR versus image number for Claire sequence at 56 
Kbits/s where only the 36 first images are reconstructed. The curve labelled “Seriel” 
is the result of the second configuration and the curve labelled “Serie 2” is the result 
for the first configuration. 




Seriel — Serie2 



Fig. 3. Claire sequence at 56 Kbits/s and 10 fps 
Seriel: second configuration result 
Serie2: first configuration result 

We observe in figure 3 that the reconstruction quality in the case of the first 
configuration decreases where the coding is performed by using the difference 
between the current image X n and the previous image X n x ( which is assumed to be 
transmitted without any loss and any quantization error). The reality is that the 
previous image in the decoder is affected by the quantization error. The difference 
between X n and X n x can not cover information which exists between these 
consecutive images. Since the reconstruction is performed by using expression 3.2 
and the bit rate is limited to 56 Kbits/s, there is not enough bits to improve the 
reconstruction quality. This is the main reason which explains the observed 
degradation quality. 




Embedded Zerotree Wavelet Coding of Image Sequence 7 1 

We then repeat the encoding process by using the second configuration. Then Claire 
and Alexis sequences are encoded at 56 Kbits/s and 10 fps. Figure 4 and figure 5 
show PSNR versus image number. 




Fig. 4. Claire sequence at 56 Kbits/s and 10 fps 
Mean PSNR: 35.98 dB 




Fig. 5. Alexis sequence at 56 Kbits/s and 10 fps 
Mean PSNR: 38.28 dB 



72 



Mbainaibeye Jerome and Noureddine Ellouze 



It is shown in figure 4 and figure 5 that the system provides good reconstruction 
quality objectively where average PSNR of 35.98 dB and 38.28 dB are reached 
respectively for Claire and Alexis sequences. Figure 6 and figure 7 show the original 
and reconstructed images. Figure 6 A and B are respectively the original and 
reconstructed images 61, figure 6 C and D are respectively the original and 
reconstructed image 134 for Claire sequence. Figure 7 E and F are respectively the 
original and reconstructed image 20, figure 7 G and H are respectively the original 
and reconstructed image 33 for Alexis sequence. The reconstruction is operated at 56 
Kbits/s and 10 fps, then the average compression ratio is 94. It is shown, despite of 
this compression ratio, that the system provides best reconstruction quality 
subjectively. There are no block effects in the reconstructed images. Since the system 
keeps the progressive encoding and decoding property of the EZW algorithm, it is 
robust against the loss of information. It means that if the encoder ceases the encoding 
process, the decoder can reconstruct the sequence with the previous received bit 
stream. Furthermore, it is possible to encode the sequence with the maximum quality 
(loss less compression) by transmitting at high bit rate. 




C D 

Fig. 6. Reconstruction results of Claire sequence at 56 Kbits/s and 10 fps 
A: original image 61, B: reconstructed image 61 
C: original image 134, D: reconstructed image 134 







Embedded Zerotree Wavelet Coding of Image Sequence 



73 




E 



F 






G H 

Fig. 7. Reconstruction results of Alexis sequence at 56 Kbits/s and 10 fps 
E: original image 20, F: reconstructed image 20 
G: original image 33, H: reconstructed image 33 



5. Conclusion 

In this paper, we have presented an image sequence coding system based on EZW 
algorithm and binary coding. Difference between the current image in the coder and 
the reconstructed image in the decoder is used as technique for removing temporal 
redundancies. The residual image is then decomposed in wavelet domain and 
encoded. Specific binary codebooks are built and used in the encoding process. 
Experimental results show that the system provides best reconstruction quality as well 
objectively as subjectively. What explains the performance of our system is the fact 
that temporal redundancies are removed between the current image and the 
reconstructed previous image in the decoder. This enables the encoder to minimize 
the overall distortion due to the quantization error and improves the reconstruction 
quality. 



74 



Mbainaibeye Jerome and Noureddine Ellouze 



References 

[1] J.M.Shapiro, “Embedded image coding using zerotree of wavelet coefficients”, 
IEEE Trans, on Signal Processing, Vol.41, No.12, pp. 3445-3462, Dec.1993. 

[2] I.Daubechies, “Orthonormal bases of compactly supported wavelets”, 
Communication on Pure and Applied Mathematics, V.41, pp. 909-996, Nov. 1988. 

[3] S.Mallat, “Atheory for multi-resolution signal decomposition: the wavelet 
representation”, IEEE Trans, on Pattern Analysis and Machine Intelligence, Vo. 11, 
pp. 674-693, July 1989. 

[4] I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, PA, 1992. 

[5] J.D. Villasenor, B.Belzer, and J.Lio, “Wavelet filter evaluation for image 
compression”, IEEE Trans, on Image Processing, Vol.4, No. 8, pp. 1053-1060, 
Aug. 1995. 

[6] G.Strang, and T. Nguyen, Wavelets and Filter Banks, Wallesley-Cambridge Press, 
Wellesley, MA, 1996. 

[7] A.Zandi, J.D. Allen, E.L. Schwartz, and M.Boliek, “CREW: Compression with 
Reversible Embedded wavelet”, IEEE Data Compression Conference, pp. 212-221, 
Snowbird, Mar. 1995. 

[8] A. Said, and W.A.Pearlman, “An image multi-resolution representation for loss 
less and lossy compression”, IEEE Trans, on Image Processing, Vol.5, No.9, pp.1303- 
1310, Sep.1996. 

[9] Y.Chen, and W.A.Pearlman, “Three-dimensional subband coding of video using 
zerotree method”, Proc. SPIE, Visual Communications and Image Processing, 
pp. 1302-1309, Orlando, Mar. 1996. 

[10] A. Said, and W.A.Pearlman, “A new fast and efficient image codec based on set 
partitioning in hierarchical trees”, IEEE Trans, on Circuits and Systems for Video 
Technology, Vol.6, No.3, pp. 243-250, Jun. 1996. 

[11] S.A.Martucci, I.Sodagar, T.H.Chiang, and Y.Q.Zhang, “ A zerotree wavelet 
coder”, IEEE Trans, on Circuits and Systems for Video Technology, Vol.7, No.l, 
pp. 109-1 18, Feb. 1997. 

[12] J.Li, P.Cheng, and C.Kuo, “ On the improvement of embedded zerotree wavelet 
coding”, Proc. SPIE, Visual Communications and Image Processing, pp. 1490-1501, 
Orlando, Apr. 1995. 

[13] H.Man, F.Kossentini, and M. Smith, ’’Robust EZW image coding for noisy 
channels”, IEEE Signal Processing Letters, Vol.4, No.8, pp. 227-229, Aug. 1997. 

[14] C.D.Creusere, “A new method for robust image compression based on the 
embedded zerotree wavelet algorithm”, IEEE Trans, on Image Processing, Vol.6, 
No. 10, pp. 1436-1442, Oct. 1997. 

[15] J.K.Rogers, and P.C.Cosman, “Wavelet zerotree image compression with 
packetization”, IEEE Signal Processing Letters, Vol.5, No.5, pp. 105-107, May 1998. 

[16] S.Joo, H.Kikuchi, S. Sasaki, and J.Shin, “Flexible Zerotree coding of Wavelet 
coefficeints”, IEICE Trans. Fundamentals, Vol.E82-A, No.4, Apr. 1999. 

[17] Michael W. Marcellin, Michael J.Gormish, Ali Bilgin, and Martin P.Boliek, “ An 
overview of JPEG-2000”, Proc. IEEE Data Compression Conference, pp. 523-541, 
2000. 

[18] Beong-Jo Kim, and W.A.Pearlman, “An embedded wavelet video coder using 
three-dimensional set partitioning in hierarchical trees (SPIHT)”, Proc. DCC’97, 
IEEE Data Compression Conference, pp. 25 1-260, Snowbird, UT, Mar. 1997. 




Embedded Zerotree Wavelet Coding of Image Sequence 



75 



[19] M. Jerome, “Optimal Image Coding based on Probability Distribution of 
Embedded Zerotree Wavelet Symbols”, Tunisian-German Conference on Smart 
Systems and Devices SSD, pp.666-671, Hammamet, Tunisia, March 27-30, 2001. 

[20] M. Jerome et N. Ellouze, “Etude energetique de 1’ analyse multi-resolution 
d’images par ondelette , Proc. in JTEA’2000, Tomel, pp. 103-109, 24-25 Mar. 2000 
Hammamet, Tunisia. 

[21] M. Jerome and N. Ellouze, “Image Wavelet Coefficients Quantization by 
Embedded Zerotree Wavelet Algorithm”, Proc. in ACIDCA’2000, International 
conference on Artificial and Computational Intelligence for Decision, Control and 
Automation in Engineering and Industrial Applications, pp.1-5, Monastic , 22-24 
March 2000. 

[22] M.Antoni, M.Barlaud, P.Mathieu, and I.Daiubechies, “Image coding using 
wavelet transform”, IEEE Trans, on Image Processing, Vol.l, No.2, pp. 205-220, Apr. 
1992. 




Wavelet-Based Video Compression Using 
Long-Term Memory Motion-Compensated 
Prediction and Context-Based Adaptive 
Arithmetic Coding 



Detlev Marpe 1 , Thomas Wiegand 1 , and Hans L. Cycon 2 

1 Image Processing Department 
Heinrich-Hertz-Institute (HHI) for Communication Technology 
Einsteinufer 37, 10587 Berlin, Germany 
{marpe , wiegand}@hhi . de 
2 University of Applied Sciences (FHTW Berlin) 

Allee der Kosmonauten 20-22, 10315 Berlin, Germany 
hcycon@f htw-berlin . de 



Abstract. In this paper, we present a novel design of a wavelet-based 
video coding algorithm within a conventional hybrid framework of tem- 
poral motion-compensated prediction and transform coding. Our pro- 
posed algorithm involves the incorporation of multi-frame motion com- 
pensation as an effective means of improving the quality of the tem- 
poral prediction. In addition, we follow the rate-distortion optimizing 
strategy of using a Lagrangian cost function to discriminate between dif- 
ferent decisions in the video encoding process. Finally, we demonstrate 
that context-based adaptive arithmetic coding is a key element for fast 
adaptation and high coding efficiency. The combination of overlapped 
block motion compensation and frame-based transform coding enables 
blocking- artifact free and hence subjectively more pleasing video. In com- 
parison with a highly optimized MPEG-4 (Version 2) coder, our proposed 
scheme provides significant performance gains in objective quality of 2.0- 
3.5 dB PSNR. 

1 Introduction 

Multi-frame prediction [11] and variable block size motion compensation in a 
rate-distortion optimized motion estimation and mode selection process [12,10] 
are powerful tools to improve the coding efficiency of today’s video coding stan- 
dards. In this paper, we present the design of a video coder, dubbed DVC , 
which demonstrates how most elements of the state-of-the-art in video coding 
as currently implemented in the test model long-term [2] (TML8) of the ITU- 
T H.26L standardization project can be successfully integrated in a blocking- 
artifact free video coding environment. In addition, we provide a solution for an 
efficient macroblock based intra coding mode within a frame-based residual cod- 
ing method, which is extremely beneficial for improving the subjective quality 
as well as the error robustness. 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 76 86, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



Wavelet- Based Video Compression 



77 



We further explain how appropriately designed entropy coding tools, which 
have already been introduced in some of our previous publications [6,7] and 
which, in some modified form [5], are now part of TML8, help to improve the 
efficiency of a wavelet-based residual coder. 

In our experiments, we compared our proposed wavelet-based DVC coder 
against an improved MPEG-4 coder [10], where both codecs were operated us- 
ing a fixed frame rate, fixed quantization step sizes and a search range of ±32 
pels. We obtained coding results for various sequences showing that our pro- 
posed video coding system yields a coding gain of 2. 0-3. 5 dB PSNR relative to 
MPEG-4. Correspondingly, the visual quality provided by the DVC coder com- 
pared to that of the block-based coding approach of MPEG-4 is much improved, 
especially at very low bit rates. 




Fig. 1 . Block diagram of the proposed coding scheme 



2 Overview of the DVC Scheme 

Fig. 1 shows a block diagram of the proposed DVC coder. As a hybrid system, 
it consists of a temporal predictive loop along with a spatial transform coder. 
Temporal prediction is performed by using a block motion estimation (BME) and 
an overlapped block motion compensation (OBMC), such that the reference of 
each predicted block can be obtained from a long-term reference frame memory. 
Coding of the motion compensated P - frames as well as of the initial intra (I) 
frame is performed by first applying a discrete wavelet transform (DWT) to an 
entire frame. Uniform scalar quantization (Q) with a central dead-zone around 
zero similar to that designed for H.263 is then used to map the dynamic range 
of the wavelet coefficients to a reduced alphabet of decision levels. Prior to the 
final arithmetic coding stage, the pre- coder further exploits redundancies of the 
quantized wavelet coefficients in a 3-stage process of partitioning, aggregation 
and conditional coding. 








78 



Detlev Marpe et al. 



Table 1. Macroblock partition modes 



Mode 


Block Size 


Partition 


1 


16 x 16 


Leave MB as a whole 


2 


16 x 8 


Split MB into 2 sub-blocks 


3 


8 x 16 


Split MB into 2 sub-blocks 


4 


8x8 


Split MB into 4 sub-blocks 



3 Motion-Compensated Prediction 

3.1 Motion Model 

As already stated above, the motion model we used is very similar to that of 
the H.26L TML8 design [2]. In essence it relies on a simple model of block 
displacements with variable bock sizes. Given a partition of a frame into macro- 
blocks (MB) of size 16 x 16 pels, each macroblock can be further sub-divided 
into smaller blocks, where each sub-block has its own displacement vector. Our 
model supports 4 different partition modes, as shown in Table 1. 

Each macroblock may use a different reference picture out of a long-term 
frame memory. In addition to the predictive modes represented by the 4 different 
MB partition modes in Table 1, we allow for an additional macroblock-based 
intra coding mode in P-frames. This local intra mode is realized by computing the 
DC for each 8x8 sub-block of each spectral component (Y,U,V) in a macroblock 
and by embedding the DC-corrected sub-blocks into the residual frame in a way, 
which is further described in the following section. 



3.2 Motion Estimation and Compensation 

Block motion estimation is performed by an exhaustive search over all integer pel 
positions within a pre-defined search window around the motion vector predictor, 
which is obtained from previously estimated sub-blocks in the same way as in 
TML8 [2]. In a number of subsequent steps, the best integer pel motion vector 
is refined to the final ^-pel accuracy by searching in a 3 x 3 sub-pel window 
around the refined candidate vector. All search positions are evaluated by using 
a Lagrangian cost function, which involves a rate and distortion term coupled 
by a Lagrangian multiplier. For all fractional-pel displacements, distortion in 
the transform domain is estimated by using the Walsh- Hadamard transform, 
while the rate of the motion vector candidates is estimated by using a fixed, pre- 
calculated table. This search process takes place for each of the 4 macroblock 
partitions and each reference frame, and the cost of the overall best motion 
vector candidate(s) of all 4 macroblock modes is finally compared against the 
cost of the intra mode decision to choose the macroblock mode with minimum 
cost. 



Wavelet-Based Video Compression 



79 





Fig. 2. (a) 1-D profile of 2-D weighting functions along the horizontal or vertical 
axes of two neighboring overlapping blocks, (b) 2-D weighting function 



The prediction error luminance (chrominance) signal is formed by the 
weighted sum of the differences between all 16 x 16 (8 x 8) overlapping blocks 
from the current frame and their related overlapping blocks with displaced loca- 
tions in the reference frame, which have been estimated in the BME stage for the 
corresponding core blocks. In the case of an intra macroblock, we compute the 
weighted sum of the differences between the overlapping blocks of the current 
intra blocks and its related DC-values. As a weighting function w, we used the 
’raised cosine’, as shown in Fig. 2. For a support of N x N pels, it is given by 

w(n, m) = w n ■ w m , w n —h 

In our presented approach, we choose N = 16 (N = 8) for the luminance (chromi- 
nance, resp.) in Eq. (1), which results in a 16 x 16 (8 x 8) pixel support centered 
over a ’’core” block of size 8 x 8 (4 x 4) pels for the luminance (chrominance, 
resp.). For the texture interpolation of sub-pel positions, the same filters as 
specified in TML8 [2] have been used. 



cos ■ 



27rn 

aT 



for n = 0, . . . , N. (1) 



4 Wavelet Transform 



In wavelet-based image compression, the so-called 9/7-wavelet with compact 
support [3] is the most popular choice. Our proposed coding scheme, however, 
utilizes a class of biorthogonal wavelet bases associated with infinite impulse 
response (HR) filters, which was recently constructed by Petukhov [9]. His ap- 
proach relies on the construction of a dual pair of rational solutions of the matrix 
equation 

M(z)M t {z 01 ) = 21, 
where I is the identity matrix, and 



M(z) 



f H z ) K~ z )\ 

\g(z) g(-z) ) ’ 



M(z) 



(Hz) H-z)\ 
\g(z) g(-z)J 



( 2 ) 



80 



Detlev Marpe et al. 




Fig. 3. From left to right : scaling function of analysis, analyzing wavelet, scaling 
function of synthesis, and synthesizing wavelet used for I-frame coding ( top row ) 
and P-frame coding ( bottom row) 



are so-called ‘modulation matrices’. 

In [9], a one-parametric family of filters h a , g a , h a and g a satisfying Eq. (2) 
was constructed: 1 



h a (z) 



K{z) 



9a{z) 

9a(z) 



V2 (1 + Z 1 

(2 + aXz® 1 + 3 + 3z + z 2 )^ 1 + b + z) 
4%/2(2 + b)(z® 2 + a + z 2 ) 

(2 + a)(z® 1 - 3 + 3z - z^i-z® 1 + b-z) 



4y/2(2 + b ) 



1 - z« 



y/2 z® 2 + a + z 2 ’ 



( 3 ) 

( 4 ) 

( 5 ) 

(6) 



where b = ja| >2, a £ 6. 

To adapt the choice of the wavelet basis to the nature and statistics of the 
different frame types of intra and inter mode, we performed a numerical simula- 
tion on this one-parametric family of HR filter banks yielding the optimal value 
of a = 8 for intra frame mode and a = 25 for inter frame mode in Eqs. (3)-(6). 
Graphs of these optimal basis functions are presented in Fig. 3. Note that the 
corresponding wavelet transforms are efficiently realized with a composition of 
recursive filters [9]. 



5 Pre-coding of Wavelet Coefficients 

For encoding the quantized wavelet coefficients, we follow the conceptual ideas 
initially presented in [6] and later refined in [7]. Next, we give a brief review of 
the involved techniques. For more details, the readers are referred to [6,7]. 



1 h a and g a denote low-pass and high-pass filters of the decomposition algorithm, 

respectively, while h a and g a denote the corresponding filters for reconstruction. 



Wavelet-Based Video Compression 



81 




Fig. 4. Schematic representation of the pre-coder used for encoding the quan- 
tized wavelet coefficients 



5.1 Partitioning 

As shown in the block diagram of Fig. 4, an initial ‘partitioning’ stage divides 
each frame of quantized coefficients into three sub-sources: a significance map, 
indicating the position of significant coefficients, a magnitude map holding the 
absolute values of significant coefficients, and a sign map with the phase infor- 
mation of the wavelet coefficients. Note that all three sub-sources inherit the 
subband structure from the quantized wavelet decomposition, so that there is 
another partition of each sub-source according to the given subband structure. 



5.2 Zerotree Aggregation 

In a second stage, the pre-coder performs an ‘aggregation’ of insignificant coef- 
ficients using a quad-tree related data structure. These so-called zerotrees [4,6] 
connect insignificant coefficients, which share the same spatial location along the 
multiresolution pyramid. However, we do not consider zero-tree roots in bands 
below the maximum decomposition level. In inter-frame mode, coding efficiency 
is further improved by connecting the zerotree root symbols of all three lowest 
high-frequency bands to a so-called ’integrated’ zerotree root which resides in 
the LL-band. 



5.3 Conditional Coding 

The final ‘conditioning’ part of the pre-coding stage supplies the elements of 
each source with a ‘context’, i.e., an appropriate model for the actual coding 
process in the arithmetic coder. Fig. 5 (a) shows the prototype template used 
for conditioning of elements of the significance map. In the first part, it consists 
of a causal neighborhood of the actual coding event C, which depends on the 



82 



Detlev Marpe et al. 



P M 



P 



level l 



(a) (b) 

Fig. 5. (a) Two-scale template ( white circles ) with an orientation dependent 
design for conditional coding of an event C of the significance map; V, H: addi- 
tional element used for vertical and horizontal oriented bands, respectively, (b) 
8-neighborhood of significance used for conditioning of a given magnitude C 



level l+l 





® 




® 


o 

□ 


o 

B 


o 









ooo 

oflo 

ooo 











scale and orientation of a given band. Except for the lowest frequency bands, the 
template uses additional information of the next upper level (lower resolution) 
represented by the neighbors of the parent P of C, thus allowing a ’prediction’ of 
the non-causal neighborhood of C. The processing of the lowest frequency band 
depends on the intra/inter decision. In intra mode, mostly non-zero coefficients 
are expected in the LL-band, so there is no need for coding a significance map. 
For P-frames, however, we indicate the significance of a coefficient in the LL- 
band by using the four-element kernel of our prototype template (Fig. 5 (a)), 
which is extended by the related entry of the significance map belonging to the 
previous P-frame. 

The processing of subbands is performed band- wise in the order from lowest 
to highest frequency bands and the partitioned data of each band is processed 
such that the significance information is coded (and decoded) first. This allows 
the construction of special conditioning categories for the coding of magnitudes 
using the local significance information. Thus, the actual conditioning of mag- 
nitudes is performed by classifying magnitudes of significant coefficients accord- 
ing to the local variance estimated by the significance of their 8-neighborhood 
(cf. Fig. 5 (b)). For the conditional coding of sign maps, we are using a context 
built of two preceding signs with respect to the orientation of a given band [7] . 

For coding of the LL-band of I-frames, the proposed scheme uses a DPCM-like 
procedure with a spatially adaptive predictor and a backward driven classifica- 
tion of the prediction error using a six-state model. 

6 Binarization and Adaptive Binary Arithmetic Coding 

All symbols generated by the pre-coder are encoded using an adaptive binary 
arithmetic coder, where non-binary symbols like magnitudes of coefficients or 
motion vector components are first mapped to a sequence of binary symbols by 
means of the unary code tree. Each element of the resulting ’’intermediate” code- 




Wavelet-Based Video Compression 



83 



word given by this so-called binarization will then be encoded in the subsequent 
process of binary arithmetic coding. 

At the beginning of the overall encoding process, the probability models 
associated with all different contexts are initialized with a pre-computed start 
distribution. For each symbol to encode the frequency count of the related binary 
decision is updated, thus providing a new probability estimate for the next coding 
decision. However, when the total number of occurrences of symbols related to a 
given model exceeds a pre-defined threshold, the frequency counts will be scaled 
down. This periodical rescaling exponentially weighs down past observations and 
helps to adapt to non-stationarities of a source. 

For intra and inter frame coding we use separate models. Consecutive P- 
frames as well as consecutive motion vector fields are encoded using the updated 
related models of the previous P-frame and motion vector field, respectively. The 
binary arithmetic coding engine used in our presented approach is a straightfor- 
ward implementation similar to that given in [13]. 

7 Experimental Results 

7.1 Test Conditions 

To illustrate the effectiveness of our proposed coding scheme, we used an im- 
proved MPEG-4 coder [10] as a reference system. This coder follows a rate- 
distortion (R-D) optimized encoding strategy by using a Lagrangian cost func- 
tion, and it generates bit-streams compliant with MPEG-4, Version 2 [1]. Most 
remarkable is the fact that this encoder provides PSNR gains in the range from 
1. 0-3.0 dB, when compared to the MoMuSys reference encoder (VM17) [10]. 
For our experiments, we used the following encoding options of the improved 
MPEG-4 reference coder: 

— —pel motion vector accuracy enabled 

— Global motion compensation enabled 

— Search range of ±32 pels 

— 2 B-frames inserted (IBBPBBP . . .) 

— MPEG-2 quantization matrix used 

For our proposed scheme, we have chosen the following settings: 

— ^-pel motion vector accuracy 

— Search range of ±32 pels around the motion vector predictor 

— No B-frames used ( IPPP . . .) 

— Five reference pictures were used for all sequences except for the ’News’- 

sequence (see discussion of results below) 

Coding experiments were performed by using the test sequences ’Foreman’ 
and ’News’ both in QCIF resolution and with 100 frames at a frame rate of 10 
Hz. Only the first frame was encoded as an I-frame; all subsequent frames were 
encoded as P-frames or B-frames. For each run of a sequence, a set of quantizer 
parameters according to the different frame types (I,P,B) was fixed. 



84 



Detlev Marpe et al. 



FOREMAN (QCIF, 10Hz) 




Bit-rate [kbit/s] 

Fig. 6. Average Y-PSNR against bit-rate using the QCIF test sequence ’Fore- 
man’ at a frame rate of 10 Hz 



NEWS (QCIF, 10Hz) 




Fig. 7. Average Y-PSNR against bit-rate using the QCIF test sequence ’News’ 
at a frame rate of 10 Hz 






Wavelet- Based Video Compression 



85 




(a) (b) 

Fig. 8. Comparison of subjective reconstruction quality: Frame no. 22 of ’Fore- 
man’ at 32 kbit/s. (a) DVC reconstruction (b) MPEG-4 reconstruction. Note 
that the MPEG-4 reconstruction has been obtained by using a de-blocking filter 



7.2 Test Results 

Figs. 6-7 show the average PSNR gains obtained by our proposed DVC scheme 
relative to the MPEG-4 coder for the test sequences ’Foreman’ and ’News’, 
respectively. For the ’Foreman’-sequence, significant PSNR gains of 2. 0-2. 5 dB 
on the luminance component have been achieved (cf. Fig. 6). Figure 8 shows a 
comparison of the visual quality for a sample reconstruction at 32 kbit/s. The 
results we obtained for the ’’News” -sequence show dramatic PSNR improvements 
of about 2. 5-3. 5 dB. To demonstrate the ability of using some a priori knowledge 
about the scene content, we checked for this particular sequence in addition to 
the five most recent reference frames one additional reference frame 50 frames 
back in the past according to the repetition of parts of the scene content. By 
using the additional reference frame memory for this particular test case, we 
achieved an additional gain of about 1.5 dB PSNR on the average compared to 
the case where the reference frame buffer was restricted to the five most recent 
reference frames only. 

8 Conclusions and Future Research 

The coding strategy of DVC has proven to be very efficient. PSNR gains of up 
to 3.5 dB relative to an highly optimized MPEG-4 coder have been achieved. 
However, it should be noted that in contrast to the MPEG-4 coding system, 
no B-frames were used in the DVC scheme, although it can be expected that 
DVC will benefit from the usage of B-frames in the same manner as the MPEG-4 
coder, i.e., depending on the test material, additional PSNR improvements of up 
to 2 dB might be achievable. Another important point to note, when comparing 




86 



Detlev Marpe et al. 



the coding results of our proposed scheme to that of the highly R-D optimized 
MPEG-4 encoder used for our experiments, is the fact that up to now, we did 
not incorporate any kind of high-complexity R-D optimization method. We even 
did not optimize the motion estimation process with respect to the overlapped 
motion compensation, although it is well known, that conventional block motion 
estimation is far from being optimal in an OBMC framework [8]. Furthermore, 
we believe that the performance of our zerotree-based wavelet coder can be 
further improved by using a R-D cost function for a joint optimization of the 
quantizer and the zerotree-based encoder. Thus, we expect another significant 
gain by exploiting the full potential of encoder optimizations inherently present 
in our DVC design. This topic will be a subject of our future research. 

References 

1. ISO/IEC JTC1SC29 14496-2 MPEG-4 Visual, Version 2. 83 

2. Bjontegaard, G. (ed.): H.26L Test Model Long Term Number 8 (TML8), ITU-T 
SG 16 Doc. VCEG-N10 (2001) 76, 78, 79 

3. Cohen, A., Daubechies, I., Feauveau, J.-C.: Biorthogonal Bases of Compactly Sup- 
ported Wavelets, Comm, on Pure and Appl. Math., Vol. 45 (1992) 485-560 79 

4. Lewis, A., Knowles, G.: Image Compression Using the 2D Wavelet Transform, 
IEEE Trans, on Image Processing, Vol. 1, No. 2 (1992) 244-250 81 

5. Marpe, D., Blattermann, G., Wiegand, T.: Adaptive Codes for H.26L, ITU-T SG 
16 Doc. VCEG-L13 (2001) 77 

6. Marpe, D., Cycon, H. L.: Efficient Pre-Coding Techniques for Wavelet-Based Image 
Compression, Proceedings Picture Coding Symposium 1997 , 45-50 77, 80, 81 

7. Marpe, D., Cycon, H. L.: Very Low Bit-Rate Video Coding Using Wavelet-Based 
Techniques, IEEE Trans, on Circuits and Systems for Video Technology, Vol. 9, 
No. 1 (1999) 85-94 77, 80, 82 

8. Orchard, M. T., Sullivan, G. J.: Overlapped Block Motion Compensation: An 
Estimation-Theoretic Approach, IEEE Trans, on Image Processing, Vol. 3, No. 5 
(1994) 693-699 86 

9. Petukhov, A. P.: Recursive Wavelets and Image Compression, Proceedings Inter- 
national Congress of Mathematicians 1998 79, 80 

10. Schwarz, H., Wiegand, T.: An Improved MPEG-4 Coder Using Lagrangian Coder 
Control, ITU-T SG 16 Doc. VCEG-M49 (2001) 76, 77, 83 

11. Wiegand, T., Zhang, X., Girod, B.: Long-Term Memory Motion-Compensated Pre- 
diction, IEEE Trans, on Circuits and Systems for Video Technology, Vol. 9, No. 1 
(1999) 70-84 76 

12. Wiegand, T., Lightstone, M., Mukherjee, D., Campbell, T. G., Mitra, S. K.: Rate- 
Distortion Optimized Mode Selection for Very Low Bit Rate Video Coding and 
the Emerging H.263 Standard, IEEE Trans, on Circuits and Systems for Video 
Technology, Vol. 6, No. 2 (1996) 182-190 76 

13. Witten, I., Neal, R., Cleary, J.: Arithmetic Coding for Data Compression, Com- 
munications of the ACM, Vol. 30 (1987) 520-540 83 



Wavelets and Fractal Image Compression Based on Their 
Self 3 Similarity of the Space-Frequency Plane of Images 



Yoshito Ueno 

Graduate School of Engineering, SOKA University 
1-236 Tangi-cho Hachioji-shi Tokyo 192-8577, Japan 
ueno@iss . soka .ac.jp 



Abstract. This paper presents a fusion scheme for Wavelets and Fractal 
image compression based on the self-similarity of the space-frequency 
plane of sub-band encoded images. Various kinds of Wavelet transform 
are examined for the characteristics of their self similarity and 
evaluated for the adoption of Fractal modeling. The aim of this paper is 
to reduce the information of the two sets of blocks involved in the 
Fractal image compression by using the self- similarity of images. And 
also, the new video encoder using the fusion method of Wavelets and 
Fractal adopts the similar manner as the motion compensation 
technique of MPEG encoder. Experimental results show almost the 
same PSNR and bits rate as conventional Fractal image encoder 
depending on the sampled images by computer simulations. 



1 Introduction 

Ordinary video compression methods based on DCT (Discrete Cosign Transform) 
have been standardized for N-ISDN networks and mobile communications. However, 
this DCT transform usually produces so called the block noise and mosquito noise 
when encoding is performed. Therefore, JPEG 2000 accepts the Wavelet transform 
instead of DCT to reduce these noises. After applying the Wavelet transform for the 
images, we can observe the self- similarity into transform-ed images and can 
introduce the Fractal image coding for these images. 

This paper presents the fusion method of Wavelet transform and Fractal coding and 
derives experimental results for still images. Also, I propose the new video 
compression algorithms to encode the inter-frame image through mapping the range 
block of N-frame into the same range block of (N+l) frame. 



2 Self-Similarity of Wavelet Transformed Images 

Generally, images carried out by 3 stages down sampling using multi-resolution 
analysis are derived into lower frequency domain and higher frequency region using 
Wavelet transform. Especially, higher frequency components of images have the self- 

Y. Y. Tang et al. (Eds.): WAA2001, LNCS 2251, pp. 87-98, 2001. 

Springer- Verlag Berlin Heidelberg 2001 




88 Yoshito Ueno 



similarity between each frequency components, horizontally, vertically and 
diagonally. This self-similarity can be derived from the correlation coefficients 
between lower frequency components and theirs similar high frequency components. 
[1] Let s try the multi-resolution analysis of images (N x N pixels) by applying 
Wavelet transform. 

At first, row pixels of images are quantized into space-frequency domain by the 
multi-resolution analysis. Then, we can assume the lower frequency components of 

pixels at the resolution level j and position n as written in { : n = 1,2, — N/2} 

and the higher components of pixels at the same level and the same position as written 

in { dn :n = l > 2 ’ N/2 }‘ 

However, the first row pixel is assumed to be { (^ k : k = 1,2, — N}. According to 

the multi-resolution analysis, images are decomposed into (j+1) level and executed by 
the following equation. 




N/ 

l/2 

V- ! 
2 



kV 0 



A C 



j 

k#2n 



i m 

Cl n #N/2 




? * 

z kV 0 



q ,c 



j 

k#2n 



Here, j: resolution level (j = 0,1,2, — ) 

Parameters p h q k are given by the next reflexive function. 

m 

%x) V ! p%2x ! k ) 



( 1 ) 



3(x) V ! q3(2x ! k) 

kVO ( 2 ) 

Here, %x) &0,-7(x) &0 during 0< k 3 m interval. 

However, m is an odd number and in general, the decomposition level is carried out 
into 3 or 5 octave down sampling. 

And also, for the column pixels, the same multi-resolution analysis is performed and 
the space-frequency domains are quntized. 

Therefore, after the multi-resolution analysis, the correlation parameter ( r between 
sub-bands of row pixels is given in the following equation. 




Wavelets and Fractal Image Compression 89 




And also, the correlation parameter ( c between sub-bands of column pixels is 
obtained by the same manner. 

2.1 Self-Similarity of Images after Wavelet Transform Using Haar Basis Filter 

Images are quantized into down sampling by two-dimensional Wavelet transform 
using Haar basis filter with orthogonal functions. Each sub-band is assumed as the 
independent images and performed into horizontal, vertical and diagonal groups as 
shown in Fig.l. 




Fig. 1 Grouping all sub-bands in the horizontal, vertical and diagonal directions 



The correlation coefficients between these groups of image components are very high 
as usual. This means that Wavelet transform by Haar basis orthogonal function carries 
out the mean value among adjacent pixels can decompose into the space-frequency 
plane with the lower resolution and repeat into n times reflectively. 

Exactly, this process repeats the finite different operation after averaging the adjacent 
pixels. 

The operation of deriving the correlation coefficients is shown in Fig.2. 








90 Yoshito Ueno 




Fig. 2 The abstraction method of histogram of correlation coefficients among sub-bands 

According to Fig.2, the 4 components of the frequency band SB n and the 16 
components of the corresponding frequency band SB nl are compared and carried out 
the correlation coefficients of them and put into the frequency distribution. 

As a result, the frequency distribution of the correlation coefficients of Lena image 
(512 X 512 pixels, 8 bps) as shown in Fig.3. 

3 — 



2 



1 



“ 1 -0.5 0 0.5 1 

Correlation coefficients between SBn and SBn-1 (Lena Image) 

Fig. 3 Self-Similarity among sub-bands of image after transform using Haar basis filter 

Therefore, Fractal coding is feasible by utilizing the high correlation coefficients of 
horizontal, vertical and diagonal components of images respectively. 

And also, the other Wavelet transform basis filter, such as 4 taps length of Daubechies 
filter (N = 2) having sharp frequency cut off characteristics and 12 taps length of 
Coiflet filter (N = 4) having linear phase characteristics for the measurement of the 
correlation coefficient among sub-bands of images have been examined. 

However, we observed the zero frequency distribution of the correlation coefficient 
after applying these filters because of longer taps and dispersing Wavelet parameters. 

2.2 Self-Similarity among the Same Bands by Wavelet Packet Transform 

Wavelet transform can decompose the lower frequency bands of images into space- 
frequency domain reflexively. Therefore, when the quantization process of the space- 
frequency characteristics of images by the multi-resolution analysis does not match to 









Wavelets and Fractal Image Compression 91 



the space-frequency characteristics of images, Wavelet transform becomes the 
inefficient decomposition. 

On the other hand, Wavelet packet transform can be adaptable to decompose the sub- 
bands of images without restricting the lower frequency bands of images. Namely, the 
adaptive octave decomposition of images has higher correlation coefficients of the 
same sub-bands of images. For example, four partition of the same sub-bands of 
images has higher correlation coefficients. When these four divided fork packets 
maintain the relation to the parent packets in the position, these relation of the 
location can be encoded into the Fractal form. [1] And also, Wavelet packet transform 
can use linear orthogonal basis filters and decompose images by rapid process using 
single tree algorithms. 



3 Reflexive Encoder Using Self-Similarity of Images 

The image compression method using Fractal encoder has the characteristics of the 
plenty amount of computation because of searching the self-similarity among subsets 
of whole images. [2] Therefore, Fractal encoder has been proposed by utilizing the 
self-similarity among sub-bands of images in the horizontal, vertical and diagonal 
group after octave decomposition of Wavelet transform respectively. 

Furthermore, Fractal encoder can be achieved by utilizing the self-similarity of the 
same sub-bands of images using Wavelet packet transform. 



3.1 Domain Decision of Pixels on the Wavelet Space-Frequency Plane 

For applying Fractal encoder of images, the range blocks of each sub-band on the 
space-frequency plane after the multi-resolution analysis are divided into the three 
domains, such as the shade region, the mid-range and the edge regions. 

Let s calculate the frequency number F(R) of pixel (R) in the range blocks and 
classify the domain using the average expected value F(R) and the range width of 
edge searching width in the sub-band, w. 

1 . Shade region: in the case of the equal values of pixels in the range blocks. 



F(R) V RS * RS 



(4) 



2. Mid-range region: in the case of the existence of all values of pixels centered 
around the expected value R\R) of the frequency number within the range 
width of the searching edge, w. 



R + 



F(R) # w 



(5) 



3. Edge region: in the case of the area without the above conditions. 

The other division methods of domains can partition into the shade and edge region 
comparing the size with the threshold level of the variance for the value of pixels, 
which is calculated inside the range blocks. 




92 Yoshito Ueno 



1 . In the case of Threshold , 2 range blocks become to be the shade region. 

2. In the case of , 2 >Threshold , range blocks become to be the edge region. 

3.2 Fractal Image Encoding 

In general, Fractal image encoder is similar to the vector quantization and encoded by 
the repetition function, that is, images are divided into several blocks and 
approximated by conversion code books which are derived from images themselves. 
This conversion equation is described into conversion terms and linear terms. 

Mapping into scale-down on the other blocks with different resolution can derive the 
information of conversion code. 

As for the Fractal encoder, we can use the IFS (Iterated Function Systems) method 
that processes the model of self-similarity images. [2] Applying the affine transform 
after searching the domain blocks that have the practical self-similarity of images for 
the range blocks on the Wavelet space can derive the scale-down images. This 
searching range is called as the domain pool. For the sake of reducing the amount of 
the computation of IFS encoding, three major parameters, such as range blocks, 
domain blocks and domain pools are selected by utilizing the correlation among the 
adjacent sub-bands. 

The scale-down image can extract the targeted range block Ry (RS*RS) and the 
domain block Dy (DS*DS): DS = 2RS from the smallest frequency band SB n 
excepting the sub-band SB 4 on the Wavelet space. 

According to the partial self-similarity of images, four range blocks R i+k , j +iand one 
domain pool Dy(4RS*4RS) can be selected from the next frequency band SB nl at the 
condition of a reference position at the lower left block as shown in Fig.4. 




SB3 SB2 

Fig.4 The selection method of each block on the wavelet space 

Then, the domain block Di j corresponding to the four range blocks Ri +k; j +i are 
selected from the domain pool. 

These processes are applied to the next frequency band and searched in the whole 
image repeatedly. Above process can be described into the following equation. 






Wavelets and Fractal Image Compression 93 



R '*(/•£>„#*) 

(6) 

Here, R: the value of pixels inside the range cell, 1: the mapping transform, / : the 
brightness scaling, Dy: the value of pixels inside the domain cell, g: the amount of 
shift. Then, 



Rv v Rv° Ri*k,j#i, and Du v D UJ 0 D 



ij 



( 7 ) 



Here, Ry : the set of range cell 
Dy : The domain pool 

And also, at the next frequency band, the same affine transform can be established. 



( 8 ) 

This block selection method can process the IFS encoder that higher sub-band images 
(SB n ) with MRR are encoded at first and the relation of position between the range 
block and the corresponding domain block at the next sub-band (SB^x), range blocks 
and domain pools are determined. At this scheme, the domain pool within higher sub- 
bands within the highest decomposition level is assumed for the whole bands. 

In general, concerning about the searching area of selecting blocks, this block 
selection method can produce minimum number of domain pools and range blocks to 
be searched due to the existence of higher frequency components of the Wavelet 
space on the higher sub-bands. 

After encoding images, range blocks of each frequency band and transformed domain 
blocks have higher correlation coefficients. When the information between them are 
redundant, choosing the one side of information can reduce these overlapped 
information. The mechanism of reducing the quantity of information by the above cell 
selection method is shown in Fig. 5. 




Fig.5 The reduction method of amounts of block information by cell selection 




94 Yoshito Ueno 



For example, when the domain block No. 24 is vanished, the original domain block 
No. 2 or beginning domain block No.O can be abstracted and reproduced into the 
necessary information. After all, at the transmitter side, the coded stream can be 
derived through the entropy coding of parameters about the scale-down images and 
the position of domain cells for each range cell. However, the lowest frequency band 
(SB 4 ) with high resolution is coded into 8 bits linear quantization. 

At the receiver side, we can find the specific domain block that has to process the 
affine transform, the brightness scaling, the brightness shift and the symmetrical 
transform for each one range block of any initial images. 

According to this transform, we can derive the pixel value of the range block from the 
domain block. 

When the mean square error between the reproduced images by this process 
repeatedly. When the begging decoded image is less than threshold, this process has 
to converge. Then, the final reproduced image can be extracted. 



3.3 Experimental Results 

The simulation conditions are given as follows; 

Range block size, RS=2: the same length as the tap length of Haar basis filter. 
Domain block size, DS=4: its step size is 2. 

Brightness scaling: select the one value among {0.5, 0.6, 0.7, 0.8, 0.9} and 
quantize it. 

Therefore, PSNR is given by the following equation. 




img : The reproduced image 

’ 2 / \ 2 • 

IJflg ! 1171 g / ) • The mean square error between the lITlg and llflg 

) : The size of image, ) x ) pixels 



The reproduced image after 14 times repetition is shown in Fig.6 and PSNR is about 
31.9db. 




Wavelets and Fractal Image Compression 95 





\ 




f 

V 






v" s { 

' >/ ' 

* ’ kV. 


i ij 

■ ' ' m 




y x . r ts, 


t- g&m 

r FW* 




4 




■ - *i 


1) Original image 




2) Regenerated image 



Fig. 6 The Regenerated image after 14 times repetitions (PSNR=31.9db) 



Then, the amount of computation about this process was examined. This algorithm 
can reduce the searching time due to the following reason. 

Each sub-bands of the same group on the Wavelet space have the self-similarity 
characteristics, while the high correlation coefficients and the restriction of the 
searching area within the specific size have range blocks and domain blocks from the 
adjacent frequency band after creating the domain pool without range blocks. 

Let s the range block size to be 2 * 2 pixels, the domain block size to be 4 * 4 pixels, 
the domain pool size to be 64 * 64 pixels, and the domain block size to be 2 pixels. 
Then, the domain pool is divided without overlapping each other. 

Finally, we can calculate the searching times. 

The number of range blocks inside the domain pool is 2 10 . The number of domain 
blocks to be searched for one range block becomes 30 2 . 

When the size of images is 5 12 * 5 12 pixels, the number of domain pools inside the 
image becomes 2 6 . 

Therefore, the ordinary searching times of domain blocks using the usual methods is 
reached to 58,982,400 times. 

On the other hand, for each of horizontal, vertical and diagonal direction on the 
Wavelet space respectively, the domain blocks with high decomposition level need to 
search at first. 

At the decomposition level 3, the size of domain pools of this sub-band is consistent 
with that of the usual method. 

Therefore, the number of the searching times about domain blocks becomes 2 10 *30 2 . 
Then, at the decomposition level 2, the number of domain blocks inside the domain 
pool 4 range blocks becomes 4 * 49 . 

This range block group has 2 10 sets and then the number of searching times about 
domain blocks is 4*49*2 10 . 

Therefore, about all three directions, the numbers of searching times of domain blocks 
become to be 

3*22 10 *30 2#4 * 49 * 2 10 # 2' 2 3 V 5,775,360 . 




96 Yoshito Ueno 



Finally, the searching times, that is, the amount of computation can be reduced up to 
l/l 0 comparing with the usual IFS methods. 

4 Video Compression Method Using Wavelet and Fractal 
Encoder 

Recently, concerning about the Fractal video compression methods, there are several 
methods being investigated. The video sequence encoder by 3D domain blocks and 
range cell has been proposed. The inter-frame mapping encoder without the repeating 
convergence that the previous frame assumes to be the domain pool has been 
investigated. 

The previous frame estimation encoder with the circulation is the range cell with the 
approximation of the domain blocks for the previous frame to be mapped. [3] 

In general, video images have the feature that the video objects of each frame within 
the same scene are assumed to be the continuous images corresponding to the motion 
vector of images. Therefore, after the scene cut detection of video, [4], several frames 
of these cuts can produce one average frame by Wavelet transform and Fractal 
encoding. 

Then, the inter-frame Fractal encoding can be carried out in the same manner as the 
motion compensation by utilizing the high correlation coefficients among several 
frames. 

Finally, Wavelet transform and Fractal encoder are executed by encoding the 
succeeding frame under the average image of Fractal encoder. 

The sequence of this process can be shown in Fig. 7. 

According to this figure, the next cut of images has to be processed by Wavelet 
transform and Fractal encoding similar as the scheme for the still image. 

The initial frame up to fourth frame becomes to be one group and processed by 
Fractal encoding after affine transform. 

Let s the range block of the k-th frame to be and then search the domain 
block D/ . of the previous circular frame by the next equation. 

( 10 ) 

Here, / i: the proper position of the domain block 

C: the constant block that the value of all pixels is 1 

O: the orthogonal function operator (Fractal still image encoder) 

S: the contrast scaling of mapping, -1< S < 1 
Oil D.C component of range block 

According to the above equation, the initial frame by Wavelet transform is assumed to 
be the previous frame and then the succeeding frame to be the domain pool needs to 
be encoded by Fractal scheme. 




Wavelets and Fractal Image Compression 97 



At the decoding side, after mapping the arbitrary circular four frames can be 
reproduced. Then, the repetitive regenerated image can be decoded using range blocks 
and domain blocks through this average regenerated image. This process needs to 
continue until the regenerated image is converged. Then, the contrast and the 
luminance of each block are compensated by the parameters, Si and Oi. 

Furthermore, every parameters of the above equation have to be quantized and 
assigned to independent bits. And also, the entropy coding, such as Huffman coding 
can increase the compression ratio of these kinds of encoders. 

Consequently, Wavelet transform and Fractal encoder can realize for the video 
compression with the lower bit rate. 




Wavelet Transform Wavelet Transform 

1^ #lCut #2Cut 

Fig.7 The principle of video coding by Wavelet transform and Fractal coding 



5 Conclusion 

In conclude, this paper presents the video compression scheme with Wavelet 
transform and Fractal encoder and also examined the images having the high 
correlation coefficients among adjacent sub-bands after Wavelet transform using Haar 
basis filter. 

As this orthogonal filter has the short length of taps, the reproduced image caused 
some ridges in the edge region of it and was observed some edges at the shade area 
due to the accumulation of small quantizing errors after the IFS encoding. 

In future, Johnson filter having the linear phase characteristics should be examined 
and the better performance by this method could be derived. 




98 Yoshito Ueno 



Furthermore, using the Wavelet packet transform, the size of range blocks and 
domain blocks can be variable because the small signal parameters at the higher 
frequency bands can reduce the distortions that affect the regenerated images. And 
also, the above-mentioned video compression scheme with the notice of the features 
of video needs to be investigated experimentally. 

Especially, the bit rate and coding efficiency using this new algorithm should be 
concerned about. 



References 

1. Stollinitz, E. J., et al.: Wavelets for Computer Graphics: A Premier Pt.l. IEEE 
Computer Graphics & Application, (May, 1995), 76-84 

2. Jaquine, A. E.: Image Coding based on a Fractal Theory of Iterated Contractive 
Image Transformations. IEEE Trans. Image Processing, Vol.l, No.l, (June, 
1992), 18-30 

3. Kim, C. S., et al.: Fractal Coding of Video Sequence using CPM and NCIM. 1 st 
New Video Media Technology Conference, (March, 1996), 72-76 

4. Tonomura, Y., et al.: Video Handling based on Structured Information for 
Hypermedia Systems. Proc. of International Conference on Multimedia 
Information Systems, (1991), 333-344 




Integration of 

Multivariate Haar Wavelet Series* 



Stefan Heinrich 1 , Fred J. Hickernell 2 , and Rong-Xian Yue 3 

1 FB Informatik, Universitat Kaiserslautern 
PF 3049, D-67653 Kaiserslautern, Germany 
heinrichOinf ormatik . uni-kl . de 
2 Department of Mathematics, Hong Kong Baptist University 
Kowloon Tong, Hong Kong SAR, China 
f red@hkbu . edu . hk 

3 College of Mathematical Science, Shanghai Normal University 
100 Guilin Road, Shanghai 200234, China 
rxyueOonline .sh.cn 



Abstract. This article considers the error of integrating multivariate 
Haar wavelet series by quasi-Monte Carlo rules using scrambled digital 
nets. Both the worst-case and random-case errors are analyzed. It is 
shown that scrambled net quadrature has optimal order. Moreover, there 
is a simple formula for the worst-case error. 



1 Introduction 

Digital (£, m, s)-nets and (£, s)-sequences are popular low discrepancy point sets 
used for quasi-Monte Carlo multidimensional quadrature [8,11]. In recent years it 
has been shown that these sets are especially effective for integrating multivariate 
Haar wavelet series. The convergence rate depends on the decay rates of the 
wavelet series coefficients. This article reports recent results by the authors and 
others. For proofs the reader is referred to the references cited. 

The following section defines the Hilbert space of multivariate Haar wavelet 
series, TL W av - Section 3 describes constructions of digital nets and sequences, and 
Section 4 defines the integration problem to be studied. The main results are 
described in Section 5. 

2 Function Spaces Spanned by Haar Wavelets 

The space, 7-f wav , of multivariate Haar wavelets studied here was defined in [15]. 
The domain of interest is the unit cube, [0, l) s , where the dimension s is any 

* This work was partially supported by a Hong Kong Research Grants Council grant 
HKBU/2030/99P, by Hong Kong Baptist University grant FRG/97-98/II-99, by 
Shanghai NSF Grant 00 JC 14057, and by a Shanghai Higher Education STF Grant. 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 99 106, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



100 



Stefan Heinrich et al. 



positive integer. Let b be an integer greater than one, and define the univariate 
basic wavelet functions as 

V> 7 (:r) = b 1/2 1 bx =7 - 6® 1/2 1 x =o, 7 = 0, 1 , . . . , 6 - 1, 

where 1^> denotes the characteristic function, and denotes the greatest 
integer less than x. For each subset u of the coordinate axes {1 ,...,$}, let 
\u\ denote the cardinality of u. For each r G u let ft r , T r and q r be integers 
with K r > 0, 0 < r r < b^ r and 0 < q r < b. Define the vectors k = (K r )r/k, 
r = (r r ) r/ ^, and 7 = (7 r )r/h- Let 'ipuKT'y be a product over r£u of the dilated 
and translated wavelets, i.e., 

VWr 7 (x) := &(****/ 2 JJ(&1 b « r+la . r =bTr+7r - 1 b K rXr =Tr ), (1) 

r /u 

where |k| = , K r . For w = 0 we take by convention i/Wt- 7( x ) = = 1- 

The wavelets defined above are not orthogonal nor linearly independent, but 
they are nearly so. As observed in [15], 

601 

X) Vw 7 ( x ) =0, Vr e u, Vtt, k, r, 7„®fr<» 

7r=0 

I K T 7 ( X ) dx. = $ un (5 k,k, ^rr I I (^7r7 r ^ )? 

where 5 is the Kronecker delta function. 

The space of integrands, 7Y W av, consists of all series of wavelet functions (1) 
whose coefficients converge to zero quickly enough. Let 

= i3 u b ® 2a ** and B u = H 7 

r/\i 

for some a > 0 and /3 r > 0 for r = 1, . . . , s. Then define the scaled wavelets as 
'0 ukt 7( x ) = 00 uk ( x ) • The space of multivariate Haar wavelets may then 

be defined as: 






/( x ) = X /«t 7 Ct 7 ( X ) = fJV’u/x) : 



7 



601 



|t|| 2 < OO & X /ukt 7 = o, Vr e W, W, K, T, 7„®fr0 

7r- = 0 J 



where is a column vector of the coefficients and is a column vector 

of the basis functions ^ . Because the wavelets are not linearly independent, 

the condition on the sum of the series coefficients is required to insure that the 
series expression for / e TC wav is unique. The inner product and norm for H wav 
are defined in terms of the scalar product and £ 2 -norm of the coefficient vectors: 



(/’V) Wwav 




11/11* 






2 



= (fjc 



V/2 



Integration of Multivariate Haar Wavelet Series 101 



3 Digital Sequences 

One important family of low discrepancy sequences is the (£, s)-sequences in 
base b [11]. Moreover, all general constructions of such sequences [1,10,12,16] 
use the digital method [8,11,13]. Owen [14] proposed a method for randomly 
scrambling (£, s)-sequences so that they are still (£, s)-sequences with probability 
one. This random scrambling has been implemented by [7]. The following defi- 
nition describes the construction of a randomly scrambled digital sequence in a 
prime base. A similar construction is possible for prime power bases. 

Definition 1 [7] Let b > 2 be a prime number , and = {0, 1, . . . , b — 1}. 

Let the following oo x oo matrices and oo x 1 vectors all have elements in Z 5 ; 
predetermined generator matrices Ci, . . . , C s , lower triangular scrambling ma- 
trices Li, . . . , L s with nonzero diagonal elements, and shift vectors ei, . . . , e s . 
For any non-negative integer i = • • • ^ 2^1 (base b), define the oo x 1 vector 
T(i) as the vector of its digits, i.e., T(i) = (ii, i 2 , . . .) T . For any point z = 
O.Z 1 Z 2 • • • (base b) E [0, 1), let <t>(z) = (z\, 22 , . . .) T denote the 00 x 1 vector of 
the digits of z. Then the scrambled digital sequence in base b is {xq, xi, X 2 , . . 
where each x^ — ( xn , . . . , Xi S ) E [0, l) s is defined by 

4>{x ir ) = L r C r T(i) + e r , r = 1, . . . , s, i m 0, 1, . . . , ( 2 ) 

where all arithmetic operations in the above formula take place using arithmetic 
modulo b. 

The basic non-scrambled digital sequence takes Li = • • • = L s = I, and 
ei = . . . = e s = 0. Owen’s randomly scrambled sequence chooses the elements of 
Li, . . . , L s , ei, . . . , e s randomly, independently and uniformly over their possible 
values. The function <f> gives proper 6 -ary expansions of its arguments, i.e., <p(z) 
cannot end in an infinite trail of b — ls. Thus, the right side of (2) should not 
give a vector ending in an infinite trail of b — ls almost surely. To insure this, it 
is assumed that any linear combination of columns of any C r cannot be a vector 
ending in an infinite trail of b — ls. 

The quality of a digital sequence is often measured by its t-value, which is 
related to the generator matrices. Smaller values of t imply a better sequence. 
The lemma below describes how to find the t-value for a digital sequence. 

Lemma 1 . [8,9,11] Let {x 0 , xi, x 2 , . . .} be a digital sequence in base b with 

generator matrices Ci, . . . , C s . For any positive integer m let c^ mA , be the row 
vector containing the first m columns of the k th row of C r . Let t be an integer, 
0 < t < m, such that for all non-negative integers k = (fiq, . . . , k s ) with \k\ = 
m — t the vectors c rm k, k = 1 , . . . , n r , r = 1 , . . . , s, are linearly independent over 
Z 5 . Then for any non-negative integer v and any A = 0, . . . , b — 1 with A < b — (y 
mod b), the set {x^m , . . . , is a (A, t, m, s)-net in base b. (Note that 

a (1 ,t,m, s)-net is the same as a (t,m, s)-net.) If the same value of t holds for 
all non-negative integers m, then the digital sequence is a (t, s)- sequence. 



102 



Stefan Heinrich et al. 



4 Problem Formulation 



The integration problem studied here is integration over the unit cube: 

1(f) = ( /(x)dx. 

J[ o,i) s 

Quadrature rules to approximate this integral take the form: 

n<S> 1 

Q(f\ P, {^i}) = .52 

2=0 

for some set of nodes P = {x 0 , . . . , x^i} C [0, l) s and some set of weights 
{wi} = {rco, . . . , w n ® i}. Quasi-Monte Carlo quadrature methods choose P to be 
a set of points evenly distributed over the integration domain and Wi = n® 1 for 

all i. 

The quality of a quadrature rule can be assessed by a worst-case or random- 
case analysis [5]. Let £> w av be the unit ball in the Haar wavelet space, i.e., 23 wav = 
{/ E W wav : II/IIt-^ < 1}. The quadrature error for a specific integrand and a 
specific quadrature rule is given by Err(/; Q) = 1(f) — Q(/;P, {^i})- Suppose 
that Q is random, i.e., the nodes, weights, and number of function evaluations 
are all chosen randomly. Specifically, let Q be chosen from some sample space, 
Q n , according to some probability distribution, /i, where the average number 
of function evaluations is n. (Deterministic quadrature rules are the case where 
Q n has a single element.) The worst-case and random-case error criteria for the 
Haar wavelet space are: 

worst-case e w (H wav ; Q n , /i) := rms sup |Err(/;Q)|, (3a) 

QPQu f/B^ w 

random-case: e r (H w&v ; Q n , /i) := sup rms |Err(/;Q)|. (3b) 

f/B wav 

The operator rms means root mean square. The worst-case error analysis corre- 
sponds to the case where your enemy chooses the worst possible integrand after 
you have chosen the particular quadrature rule. The random-case error analysis 
corresponds to the case where your enemy chooses the worst possible integrand 
after knowing your method for randomly choosing quadrature rules, but before 
you choose a particular one. 

The optimal error criteria for the Haar wavelet space are defined as the infima 
of the above with respect to all possible quadrature rules: 






i , n) := inf e v 

Qn 



(7"fwav 5 Qn 5 li), e\U 

w 



/•)Ti) : — inf e (7Y wav 5 Qm aO* 



A sequence of random quadrature rules (Qn m , Mm), m = 0, 1, 2, . . . is said to be 
optimal if it has the same asymptotic order as best possible quadrature rules. 
Specifically, one has worst-case and random-case optimality if there exists some 



Integration of Multivariate Haar Wavelet Series 103 

nonzero constant C independent of n such that for all n = 1, 2, . . . 

min e w (W wav ; Q„ m ,/U m ) < Ce w (H wav ,n), 

rim^n 

min e (7Y wa v? Qrimi l^m) *^iCc (TC w8i ^,nf 

n m Jj- n 

It is possible for a sequence of quadrature rules to be optimal for one of the 
above criteria and not for the other. 

5 Results 

A key ingredient in the worst-case and random-case error analyses is the ooxoo 
matrix whose elements are the mean square errors of integrating the product of 
any two wavelet functions by a randomized quadrature. Define 

A ■■= E Q/Qn {[Err(^;Q)][Err(^;Q)] r } . 

Then the worst-case and random-case error analyses can be expressed as in the 
following theorem. 

Theorem 2. [2,5] Consider the case of random quadrature rules applied to 

multivariate Haar wavelet series. The error criteria defined in (3) are given by 

e w (]ht W av', Qn ) = \/trace(A), assuming a > 1/2, 
e\H 

wavi Qn) = Vp(A), assuming a > 0, 

where trace denotes the trace, and p denotes the spectral radius or largest eigen- 
value. 

The assumption a > 0 is required to insure that the Haar wavelet series are 
square integrable, so that the random-case error analysis is valid. The assump- 
tion a > 1/2 is required to insure that the Haar wavelet series are absolutely 
summable, so that the worst-case error analysis is valid. From this theorem it can 
be seen that the worst-case error criterion is never smaller than the random-case 
error criterion because the trace of a matrix is never smaller than its spectral 
radius. The relationship in Theorem 2 in fact holds for all Hilbert spaces of 
functions [5]. 

The above formulas are difficult to evaluate precisely in general. However, 
for quasi-Monte Carlo quadrature rules based on scrambled digital nets one can 
derive a simple formula for e w ([H wa , v ’ 1 Q n ). For any oox 1 vector <f> = (0i, </> 2 , . . .) T , 
let £(</>) denote the number of zero elements in <f> preceding the first nonzero 
element: 



£(<£) = min{fc : 4 > k + 1 + 0}. 



104 



Stefan Heinrich et al. 



In other words, the smallest interval of the form [0, 6 0/c ), k = 0, 1, . . . that con- 
tains £ is [0 Next define the function G(£;a) as follows: 



G(£a) 



- 1 , 

^2ck01 

(fo2a®l 



£ = o, 

— 1 - 5l®(2«®l)£) _)_ 0 < £ < OO, 

1)® 1 (6- l)6 2a ® 1 , £ = oo. 



The kernel function, /C wav (x, y) is defined in terms of G as 



ICw av(x,y) = -1 + Hi 1 + /AG(£(</>(av) - <j>(y r ));a)]. 
r=l 



This is, in fact, the reproducing kernel of the Hilbert space 7Y W av [17]. 

Theorem 3. [17] Let {x^} be a basic , non-scrambled digital sequence in a prime 
power base b as defined in Definition 1. For quasi- Monte Carlo quadrature using 
any non-scrambled or randomly scrambled digital (A ,t,m,s)-net with n = A b m 
points it follows that 



[' e W (H W av ; Qn)? 



1 

n 






A®1 



o) + E 



2(A - i) 



i=0 



i=i 



6 m (g) 1 

y ^ K"wav(?X-ib rn -\-i 5 h) 
*=0 



Although analogous formulas for e w (7Y wav ; Qn) exist for general reproducing 
kernel Hilbert spaces of integrands and general quadrature rules, they require 
0(n 2 ) operations to evaluate. Because of the good match between 7Y W av and 
digital nets the above formula only requires O (n) operations to evaluate. 

The asymptotic behaviour of e w (H WSLV , Qn) and e r (7"f wav ; Qn) for scrambled 
net quadrature may be obtained by looking at the gain coefficients of nets as de- 
fined in [15] and analyzed in [6]. Lower bounds on the optimal convergence rates 
for quadrature rules may be obtained by constructing Haar wavelet series that 
fool any quadrature rule. Putting these results together leads to the following 
theorem. 



Theorem 4. [2] For quasi- Monte Carlo quadrature of Haar wavelet series us- 
ing scrambled (A , £, m, s)-nets in base b, the error criteria defined in (3) have the 
following asymptotic orders: 

min e w {H wav ; Qsc,\b m ) x e w (H wav ,n) x n®“[logn] (s ® 1)/2 , a > 1/2, 

A b m JJ.n 

min e r (H wav ', Qsc,\b™) x e r (H wa v,n) x n ®“® 1 / 2 ) a > 0, 

A n 

where x means “exactly the same asymptotic order”. 



Integration of Multivariate Haar Wavelet Series 



105 



6 Conclusion 

The original reason for investigating the integration of multivariate Haar wavelet 
series arose from studies of quasi- Monte Carlo quadrature of arbitrary functions. 
If one uses scrambled nets as the sampling points then this has been shown to 
be equivalent to integrating Haar wavelet series [4,6]. Thus the results reported 
above have broader applicability. 

However, no matter how smooth one assumes the integrand to be, the best 
convergence one can obtain using scrambled digital nets is 0(n (g)3 / 2+e ) for the 
worst-case error. It appears that to handle smoother integrands well one must 
consider smoother wavelets and different quadrature rules. This is an open prob- 
lem. 



References 

1. H. Faure, Discrepance de suites associees a un systeme de numeration (en dimen- 
sion s), Acta Arith. 41 (1982), 337-351. 101 

2. S. Heinrich, F. J. Hickernell, and R. X. Yue, Optimal quadrature for Haar wavelet 
spaces , 2001, submitted for publication to Math. Comp. 103, 104 

3. P. Hellekalek and G. Larcher (eds.), Random and quasi-random point sets , Lecture 
Notes in Statistics, vol. 138, Springer-Verlag, New York, 1998. 105 

4. F. J. Hickernell and H. S. Hong, The asymptotic efficiency of randomized nets for 
quadrature , Math. Comp. 68 (1999), 767-791. 105 

5. F. J. Hickernell and H. Wozniakowski, The price of pessimism for multidimensional 
quadrature , J. Complexity 17 (2001), to appear. 102, 103 

6. F. J. Hickernell and R. X. Yue, The mean square discrepancy of scrambled ( t,s )- 
sequences , SIAM J. Numer. Anal. 38 (2001), 1089-1112. 104, 105 

7. H. S. Hong and F. J. Hickernell, Implementing scrambled digital nets , 2001, sub- 
mitted for publication to ACM TOMS. 101 

8. G. Larcher, Digital point sets: Analysis and applications , In Hellekalek and 
Larcher [3], pp. 167-222. 99, 101 

9. , On the distribution of digital sequences , Monte Carlo and quasi-Monte 

Carlo methods 1996 (H. Niederreiter, P. Hellekalek, G. Larcher, and P. Zinter- 
hof, eds.), Lecture Notes in Statistics, vol. 127, Springer-Verlag, New York, 1998, 
pp. 109-123. 101 

10. H. Niederreiter, Low discrepancy and low dispersion sequences , J. Number Theory 
30 (1988), 51-70. 101 

11. , Random number generation and quasi- Monte Carlo methods , CBMS-NSF 

Regional Conference Series in Applied Mathematics, SIAM, Philadelphia, 1992. 
99, 101 

12. H. Niederreiter and C. Xing, Quasirandom points and global function fields , Finite 
Fields and Applications (S. Cohen and H. Niederreiter, eds.), London Math. Society 
Lecture Note Series, no. 233, Cambridge University Press, 1996, pp. 269-296. 101 

13. , Nets, {t, s)- sequences and algebraic geometry , In Hellekalek and 

Larcher [3], pp. 267-302. 101 

14. A. B. Owen, Randomly permuted (t,m, s)-nets and {t, s)- sequences, Monte Carlo 
and Quasi-Monte Carlo Methods in Scientific Computing (H. Niederreiter and 
P. J.-S. Shiue, eds.), Lecture Notes in Statistics, vol. 106, Springer-Verlag, New 
York, 1995, pp. 299-317. 101 



106 



Stefan Heinrich et al. 



15. , Monte Carlo variance of scrambled net quadrature , SIAM J. Numer. Anal. 

34 (1997), 1884-1910. 99, 100, 104 

16. I. M. Sobol’, Multidimensional quadrature formulas and Haar functions (in Rus- 
sian), Izdat. “Nauka”, Moscow, 1969. 101 

17. R. X. Yue and F. J. Hickernell, The discrepancy of digital nets , 2001, submitted to 
J. Complexity. 104 



An Application of Continuous Wavelet 
Transform in Differential Equations 



Qu Han-zhang 1 , Xu Chen 2 , and Zhao Ruizhen 3 



1 Xi’an Post and Telecommunications Institute 



Xi’an, P. R. China 
2 Xidian University 



Xi’an, 710071, P. R.China 
3 Shenzhen University 



518060, P.R. China 



Abstract. The relation btween some differential equations and the inte- 
gral equations is discussed; the differential equations can be transformed 
into the integral equations by using the continuous wavelet transform; 
the differential equations and the integral equations are equivalent not 
only in the weak topology but also in the strong topology; the discussion 
on the differential equations can be connected with the discussion on the 
integral equations. 

1 Introduction 

Wavelet theory includes the discret wavelet transform and continuous wavelet 
transform. On the discrete wavelet transform and its applications there are many 
papers. But on the continuous wavelet transform and its applications there are 
a few papers. Especially on the application of the continuous wavelet transform 
there are few papers. Therefore it is necessary to continue to discuss the wavelet 
transform and its applications. 

On the continuous wavelet transform there are some results. These results 
mainly come from ’Ten Lecture on Wavelets’ Wwritten by Ingrid Daubechies. 
Among those results there is the following result. 



Lemma l.lM € L 2 (R), 0 < < Too, then for any f(x) E L 2 (R) 



The above formula is true not only ine weak topology but also in the strong 
topology. 

In this paper we connect some differential equatins with the integral equa- 
tions by using the continuous wavelet transform, provide a method of the dis- 
cussing the properties of the differential equations and enlarge the applications 
of continuous wavelet transform. 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 107 116, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 




108 Qu Han-zhang et al. 



2 Some Differential Equations and the Integral Equations 

We consider the following equation. 



^a fc (x)y (fe) = b(x) 



( 1 ) 



k = 0 



{a k (x);k = 0, 1, C L (/»’). {// A 'h A- 0.1. •••,//} c L 2 (R),b(x) e L 2 (R) 

Take k = 0, 1, • • • ,n} C L 2 (R),Supp(tJj) C [-L, L ], 



o < c. 



*l> 



L 






■dp = (27T)® 1 < +oo 



According to lemma 1.1 there is the following. 

y(x) = [ rri f < v^a,b > $ a ,b(x)db ( 2 ) 

Jr M Jr 

We differentiate formula (2). If we could commute the order between the 
differential and the integral, we should get 

V {k \x) = f f < y,ip a ,b > a® k ^l(x)db ( 3 ) 

JR l a l JR 

{fc = 0, 1, • • , n} 

In formula (1) we should substitute the expressions of y and y^{k = 0, 1, 
. . . , n} for y and y^{k = 0, 1, . . , , n} respectively, and should get the following. 



T~T5 I <Vi ^a,b > ^2 a® k a k (x)^l (x)db = b(x) 



(4) 



R \ a \ JR 



k = 0 



Because we don’t know whether the order between the differential and the 
integral can be commuted, we don’t know whether the integral operator in the 
left of formula (4) exists or not. 

77/ /> i /» TL 

^a k (x)y (k] = / — ^ / <y^a,b>^a® k a k {x)^l(x)db (5) 
k = o Jr |a| Jr k = o 

If formula (5) is true, we say that differential equation (1) is equivalent to 
integral equation (4). If formula (5) is true in the weak topology, we say that dif- 
ferential equation (1) is equivalent to integral equation (4) in the weak topology. 
If formula (5) is true in the strong topology, we say that differential equation (1) 
is equivalent to integral equation (4) in the strong topology. 

Define 



n+ 1 

h = {(/, n- ■ ■ , / n ); {/, n- • • , n c t 2 or)} c 




An Application of Continuous Wavelet Transform in Differential Equations 



109 



If formula (5) is true, the integral operator in the left of formula (5) is a 
bounded linear operator from H to L 2 (R). 

From our assumption we can not know whether the order between the dif- 
ferential the integral can be commuted, so we can not know whether they are 
equivalent. In this paper we maily discuss whether formula (1) is equivalent to 
formula (4). Are they equivalent in the weak topology? Are they equivalent in 
the strong topology? 

3 The Relation between Formula (1) and Formula (4) in 
the Weak Toplogy 

In order to discuss whether they are equivalent, firstly we discuss whether they 
are equivalent in the weak topology. 

Theorem 3.1 Formula (5) is true in the weak topology. That is, formula (1) 
is equivalent to formula (4) in the weak topology. 

Proof: We only need to prove that the operator in the left of formula (1) 
is equivalent to the operator in the left of formula (4) in the weak topology. 
In order to do this we only need to prove that in the weak topology for k = 
0, l,--,n, a,k(x)yW is equivalent to f R $gf R < y,i>a,b > a^ k a k (x)^ a k l(x)db. 
That is, for any g{pc) G L 2 (i?), there is the following. 

< akV {k \g >= [ 44 f < y,ipa,b >< a® k akdal,g > db ( 6 ) 

Jr \ a \ Jr 

Define 

C L 2 (i?), fc = 0, 1, • • n, Supp(f^) is compact j 

For any 

x = (/, n • • , n e Hu ix\\ Hl = j £ ii ii 2 

U=o 

Firstly we prove that formula is true for any (y, * • , y n ) G H\. 

For any (y,y~?- • • ,y n ) G we only need to calculate the inner product in 
the left of formula (6). 

< a k y {k \g >=< y (k \a k g >=< (y {k) ), ( a k g ) > 

= 2* J^y {k) %)(a k g)d(r]) dw 

= ^ Jj.y {k) )(v)(akg)d(v) J ^ dr, 





110 Qu Han-zhang et al. 



(' taking w = arj) 

f da ™ 

= 2lr I 12 f T \h~( '<P( a v)i’(av)(akg)dv 
Jr \ a rJ R {w k y(v 

( according to FubiniF theorem^) 

= J R ja^ a ' S ' /C 

(according to the property that inverse Fourier transform preserves their inner 
products ^ ) 

- L K 

(according to the property that (\f^)(arj) = (iar}) k if(arj)) 

= [ f^a® k f < y, (ip a , b ) >< (V^T, ( a k g) > db 
JR l a l JR 

( substituting (ip a ,b), (V^T /or ] ) 

= / r^-a®* [ < y,ip a ,b >< ^la k g > db 
Jr l a l 2 /a 

(according to the property that Fourier transform preserves their inner products) 

= f l"~F f < >< > db 

Jr M 2 Jr 

That is, for any (?/, • • , y G Hi formula (6) is true. Thatis, the integral 

operator in the left of formula (6) is a bounded linear operator. 

For any (g 0 , gi, ■ ■ ■ , g n ) £ Hi- there is a sequence (y t , y R - ■ ■ , y[ n) } of Hi 

1 

such that for any m > l, {Efc = o 1 1 Vm ’ - y[ k) || 2 | 2 

< 2 ®',lim u ojELollsw-^ll 2 } 2 =0. 

For k = 0, 1, • • • , n, = y[ k ^ + ^ (Ui+i ~ Vi^)- If is true in the strong 

i=i 

topology. 

According to the properties of the bounded linear functional we have the 
follwing. 

< a k g k ,g >=< g k ,a k g >=< y[ k) + \ - y (k) ),d k g > 

i=i 

= f r"F f <yi,'*Pa,b >< a® k ip (k) b ,a k g > db 
JR \ a \ JR 




An Application of Continuous Wavelet Transform in Differential Equations 111 



+ Y [ Tl 2 [ < yi + 1 - yi’i’afi >< a® k ip {k) b ,a k g > db 
l=1 Jr \ a \ Jr 

= [ 44 [ <Vi + y2(yi + 1 - yi),i>a,b >< a® k ip {k) b ,a k g > db 
Jr \ a \ Jr l=1 

= [ ri2 [ < 9o,i>a,b >< a® k ^ a k l,a k g > db 
Jr \ a \ Jr 

That is, for any (go? gi? * * * ? 9n) C Hi formula (6) is true in the weak topology. 
Because H\ C H C idi, for any (g, • • , ?/ n )) G H formula (6) is true in 

the weak topology. 

We complete the proof. 



4 The Relation between Formula (1) and Formula (4) in 
the Strong Topology 



We prove that formula (1) is equivalent to formula (4) in the strong topology. 
Theorem 4.1 Formula (1) is equivalent to formula (4) in the strong topology. 
Proof: We prove that the following formula is true. 



lim 

Ail 0 ,A 2 f ,B 1 



Y a k(x)y {k] \x) 

k=0 






da f 

WJ* 



**^B 



< y,^a,b > 



Y a ® fea ^)^i fe b ( X ) db \\ = 0 

k = 0 



According to the triangle inequality and (R) we only need to prove 

that for k = 0, 1, • • • , n, 



..lim B| II y (k \ x )~ [ / <y,*l> a ,b>a® k 'il> ( a k) b (x)db\\ =0 

ra 0,A 2 1 ,B 1 Ja 1 ^^a 2 \ a \ J*A i: 

(7) 

Firstly we prove that formula (7) is true for any (y,y~?- • • G Hi. 

According to Riez’s lemma, we have 



II y {k \x) - [ 44 [ < y,^a,b > 0® k ^ { al 

Ja 1 ^a^a 2 l a l Jaa->b 



sup < y ^ 


-/ 


da 
| _ |2 


1 


< y,i>a,b > a® k ip£ldb, 


A>/^ i 


J A\ ’ A- A~^A 2 


a 






< sup 


f da 

/ m |2 


/ < 


y, Tp a,b 


X a® k ^ k) b db,g > I 




Ja 1 \**I«I . 


/i? 






+ sup 


f da 

1 m |2 


/< 


Vi^a,b 


X a® k ^ k) b db,g > I 






Ir 








112 Qu Han-zhang et al. 



+ sup I [ ri 2 [ <y,i>a,b>< a® k 'ip < a k) b db,g > | (8) 

AA^ J A\ \ a \ J ft Jfr\B 

(i according to formula (6) and the triangle inequation) 

According to the process of proving theorem 3.1 there is the following result. 

[ < y,^a,b >< a® k i() { a k) b ,g > db= [ < y (k \ip a ,b >< i> a,b,g > db (9) 

Jr ’ Jr 

Firstly we prove that the first expression in the end of formula (8) converges 
to zero as A\ — > 0, A 2 ,B — ► oc. 



. f da 
sup | / 

AAi A4 1 \ J **l a n 



< y,*Pa,b >< a® k ip < £ ) b db,g 



!R 



> 



= SUP I [ Tl 2 [ <y (k \' l Pa,b><i’a,b,g> db\ 

/9/*=i JA 1 \jk*\ a \ Jr 



( 1 according formula^ 9)) 



< sup 



da 



[ ri2 I I < y {k) ^a,b > II < i>a,b,9 > \db 

y§ x AiJ Al \jkx\ a \ Jr 



< sup |< / | 



sup 
AA± l JA 



< sup | 

A A 1 l JA 



[ | < y {k \^a,b > 1 2 db 

Jr J 

1 

t~[ 2 [ \ < J>a,b,g > l 2 ^} 

iN**l a l Jr J 

da f 
1 \ f » l *l a l 2 Jr 



< y {k) ,J>a,b > 1 2 db 



(according to formula( 2)) 



The integral converges to zero as A\ — > 0 because its infinite integral con- 
verges. 

The second expression in formula(8) converges to zero as A 2 — > oc. Its proving 
is analogous to the first. 

Finally we prove that the third expression in formula (8) converges to zero 
as B — > 00 . 

Take 



M= sup | [ -prj [ <y,ip a ,b><a K ip ( a k) b ,g>db\ 

/SX JA 1 ^kJk*A 2 l a l 

< sup I [ Tl 2 f <y^a,b><a K ip (k) b^9> db\ 

A A 1 1 J A\ and J 




An Application of Continuous Wavelet Transform in Differential Equations 113 



+ sup I f f < y,ip a b >< a K ip^,g > db 

J Ai^JkJI$->A 2 and J 



= Mi T M2 

Because for k = 0, 1 , • • • , n, Supply is compact, there is an Ni > 0 such 
that for k = 0, 1, • • • , n, Supply W) C [—Ah, Ah- 

If |rr | > ni,?/(x) = 0. If we take 5 > (L + Ah), |#| < Ah, and \a\ < 1 , then 
|^| > L. That is, Mi =0. 



M 2 = sup | / 
A^ Ja x 



and 1 



TJ2 [ < y^a,b >< a K -ip { a k) b ,g > db\ 

J a l 

f 01/ I <y,ipa,b> || <^ 1,9 > \db 

and 1 



< sup 

A J A\ <r^tJft+A 2 and 





C da 


/ 1 < VAa,b 


< sup < 


/ m 12 


A^ 1 | 


yn N . 


Ib~+a 


sup < 


f f da 


\ 1 <da },9 > 


A A - 1 


l Jr \ a \ 


JR 



As B — > 00 the first expression in the above formula converges to zero because 
its infinite integral converges. The second is bounded. 

That is, the third expression in the formula ( 8 ) converges to zero as B — » 00. 
Formula is true for any (■ y , • • , y^) E Hi. 

In order to provr that formula (7) is true for any (y,y~?- • • , ?/ n )) E H 1, we 
discuss the following bilinear forms. 

T Al ,A 2 ,B((f,f^---J {n) ),g) = f TI2 [ <yAa,b><a K Abi 9 > db\ 

JA 1 ^k^A 2 \ a r Jb^x* 

r((/,/ 7 -",/ (n) ), 3 ) = / [ < y,^a,b >< a K Al ->9 > db\ 

JR \ a \ JR 

(fJ 7 ---J {n) )£H ll 9 eL 2 (R) 

According to the definition of the integral we have 

lim T AuA 2 , B ((fJJ---J in) ), 9 ) = T((f,n--J M ), 9 ) 

Till 0 , A 2 ,Bl 



According to the properties of the bounded linear operators we can generalize 
Ta 1 ,A2,b,T from Hi x L 2 (R ) to Hi x L 2 (R). 

We fix 5 e £ 2 (i?). For any a: e #1, we have liiru^ 0 ,a 2 ,bi T AltA 2 tB (x,g) = 
T(x,g), that is, {T AltA 2 tB (x, g )',0 < A\ < A2 < +00 , 0 < B < +00} is bounded. 




114 Qu Han-zhang et al. 



O+i) 

Hi is a closed subspace of L 2 (R) x ••• x L 2 (R). According to the uniform 
bounded principle we have that {||Ta 15 ^ 2jj b(< 7 )]|; 0 < A\ < A 2 < Too, 0 < B < 
+oc} is bounded. That is, there is a positive number sup{||T^ 1? ^ 2?j b(#)||; 0 < A i 
< A 2 < +00, 0 < B < +00} = K(g) such that for any 0 < A\ < A 2 < +00, 0 < 
B < +00, \\T Al ,A 2 ,B(g)\\ < K(g). 

Because for any g e L 2 (R), {\\Ta 1 ,a 2 ,b( 9 )\\] 0 < A\ < A 2 < Too, 0 < B < 
+00} is bounded and || T Al ,A 2 ,B(gx + £2) || < \\T Ai ,a 2 ,b{ 9 i)\\ + ||T Ai ,a 2 ,b(^2)||, 
\\Ta 1 ,a 2 ,b{^ 9 )\\ = \a\\\TA 1 ,A 2 ,B{g)\\, L 2 (R) is a Hilbert space, according to the 
uniform bounded principle we have that {\\Ta 1 ,a 2 ,b || ; 0 < A\ < A 2 < +oc, 
0 < B < +00} is bounded, that is, there is a positive number K such that 
sup{||Ta 1)A2?jB ||;0 < Ai < A 2 < +00 , 0 < B < +00} < K. 

Secondly we prove that formula ( 7 ) is true for any (go,g 1, • • • ,g n ) E Hi. 

For any e > 0 , there is (y, y " 7 * • • , Y € Hi such that 

Ct 91 -// n !)<l 

1=0 

Because for k = 0 , 1 , • • • , n, 



lim 

Ail 0, A 2 ,B i 



II y (k) - f TJ2 [ < V, 1 pa,b 

Ja 1 ^+a*a 2 \ a \ 



><a® K ^ 






a,b 



= 0 



we have 5i > 0, N 2 > 0,Nb > 0 such that for k = 0, 1 , • • • ,n, if A\ < <5i , A 2 
> N 2 , B > Nb , 



l|y (fe) - 



da 



f <y, V’ a,b >< a® K ip (k) b db\\ < j 

Jb^aa 4 



/ Ai ^ mua 2 Or JB^AA 

If Ai < 5i, A 2 > N 2 , B > Nb , we have 

lls(k) - [ Or [ < go,ipa,b >< a® K ip {k) b db\\ 

Ja 1 ^AA*A 2 01 JB^AA 

= sup I < g {k) - [ Or / < 9oAa,b >< a® K ip {k) b db,g > 

/§Y ^1 J Ai ^AA+A 2 01 J B^-AA 



< sup \<g k - y (k) ,g > 



+ 



sup I < y {k) - [ O' [ < y,ipa,b >< a® K tp {k) b db,g > 

J Al ^k+*A 2 M 2 

f ]0 f <y~ 90, ^ a,b >< a ® K dabdb, g > 

JAi+JMk+Ai l a l 



+ sup , 

Ja 1 ^+i^a 2 l a l Jb«** 



< y™ - 



< II 9k - y (k) 

da 



[ T% [ <y,'<Pa,b><a® K ip {k) b db\\ 

JAi«**h.A 2 l a l Jb«** 




An Application of Continuous Wavelet Transform in Differential Equations 115 



+K^2\\y {l) -gi\\<^+ e 1 + l<e 

1 = 0 

Formula (7) is_true for any (g Q ,gi, ,g n ) e H 1 . 

Because H C H\, for any {y,y^- • • ,y^) E i7, formula (7) is true. 
We complete the proof. 



5 Example and Conclusion 



We introduce the following example. 

Example 5.1 We consider the following differential equation. 



E a i(x)y^ = f(x) (10) 

i=0 

{f(x),a 0 (x),- ■■ ,a n (x)} C C[- 7r,7r], {y {l) (x);i = 0, 1, •••,«} C L 2 (R) (11) 

Require the solution of differential equation (10) that satisfies formula (11). 
If x [— 7r,7r], for any % = 0, 1,---,AT, we define y^\x) = 0 ,ai(x) = 0, 

f(x) = 0. Then {f(x),a 0 (x),---,a n (x)} C L (R), {f(x),a 0 (x),---,a n (x), 
y{x),---,y {n) (x)} c L 2 {R). 

Take 

, , x f COSX X E [ — 7T, 7T 1 

^> = {0 

y 1 sinfij + sinOrj — 1 )tt 

m = — + „-i 1 

0 < Cijj <C +00 

According to the above results there is the following. 



+ da 


r&Jk + X 


> w . 


J <g)JkJfr+x 




l 


:/ 


y(z)cos- 







b-z_ 

a 






2 = 0 



(X E [ — 7T, 7r] ) 

In order to solve equation (10) we only need to solve the above integral 
equation. 

We connect some differential equations with the integral equations by using 
the method of continuous wavelet transform. We obtain that formula (4) is 
equivalent to formula (4) not only in the weak topology but also in the strong 
toplogy. 




116 Qu Han-zhang et al. 



References 

1. Ingrid Daubechies. Ten Lectures on Wavelets. Philadelphia. Pennsy vania: Society 
for Industrial and Applied Mathematics. 1992 

2. Zheng Wei-xing, Wang Sheng-wang. Outline of real function and functional anal- 
ysis. Academical Education Press, China. 1991 

3. Song Guo-xiang. Numerical Analysis and Introduction to Wavelet. Science and 
Technology Press of Henan, China. 1993 

4. Charles K.Chui. An Introduction to Wavelets. Academic Press. Inc. 1992 




Stability of Biorthogonal Wavelet Bases in ) 



Paul F. Curran 1 and Gary McDarby 2 

1 Department of Electronic and Electrical Engineering, University College Dublin 

Belfield, Dublin 4, Ireland 
paul . curranOucd . ie 
2 Medialab Europe, 

Crane St., Dublin 8, Ireland 
gary@med.ia . mit . edu 



Abstract. For stability of biorthogonal wavelet bases associated with 
finite filter banks, two related Lawton matrices must have a simple eigen- 
value at one and all remaining eigenvalues of modulus less than one. If 
the filters are perturbed these eigenvalues must be re-calculated to de- 
termine the stability of the new bases - a numerically intensive task. We 
present a simpler stability criterion. Starting with stable biorthogonal 
wavelet bases we perturb the associated filters while ensuring that the 
new Lawton matrices continue to have an eigenvalue at one. We show 
that stability of the new biorthogonal wavelet bases first breaks down, 
not just when a second eigenvalue attains a modulus of one, but rather 
when this second eigenvalue actually equals one. Stability is therefore 
established by counting eigenvalues at one of finite matrices. The new 
criterion, in conjunction with the lifting scheme, provides an algorithm 
for the custom design of stable filter banks. 



1 Introduction 

In 1988 Daubechies [1] discovered a class of compactly supported orthonormal 
bases for L 2 (Ii) which included the Haar basis as a special case. Mallat [2] estab- 
lished the relationship between wavelet transforms and multi-resolution analyses 
and showed that a discrete wavelet transform (relative to an orthonormal basis) 
can be implemented using orthogonal filter bank theory. 

Whereas orthogonality of the basis is a useful property in the analysis and 
synthesis of signals, it is not indispensable. In 1992 Cohen, Daubechies and 
Feauveau [3] introduced the idea of biorthogonal wavelet bases. In this case two 
distinct bases are employed, one for analysis and one for synthesis. The two 
bases are not necessarily orthogonal in their own right but are orthogonal to one 
another. Biorthogonal bases offer increased flexibility in the design of the associ- 
ated filter bank enabling, for example, the construction of filter banks from linear 
phase filters. Cohen, Daubechies and Feauveau [3], Cohen and Daubechies [4] 
and Strang [5] provide necessary and sufficient conditions for a pair of dual filters 
to generate biorthogonal compactly supported wavelet bases in L 2 (I?). 

Sweldens [6] introduced the lifting scheme for designing biorthogonal filter 
banks. This scheme formally maintains biorthogonality but does not guarantee 

Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 117 128, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



118 Paul F. Curran and Gary McDarby 



that the filter bank has associated compactly supported wavelet bases in L 2 (R). 
Whereas the lifting scheme in general contains many free parameters we re- 
formulate it in terms of a single parameter. The principle contribution of the 
present work is the observation that for real, finite filters the single parameter 
dependent lifting scheme generates biorthogonal filter banks having associated 
wavelets in L 2 (R) provided the parameter lies in an open interval containing zero. 
We present an algorithm for finding the largest interval of this kind. In conjunc- 
tion with the lifting scheme, this algorithm provides a method for the custom 
design of biorthogonal filter banks with associated wavelet bases in L 2 (R). While 
the method is cumbersome for large filters it has been found to be numerically 
tractable for filters having up to twenty taps. The resulting wavelet bases depend 
continuously upon the single parameter of the lifting scheme. In principle there- 
fore it is possible, by employing a variety of different optimisation techniques, 
to select the value of this parameter such that the associated wavelet basis is 
optimal in some sense. 

2 Lawton Matrices 

Given ft = [h<$ m , . . . , ft®i , fto , hi , . . . , ft m ], a real filter of length (2 m + 1), as 
usual we define the z-transform of the filter to be: 

m 

H(z) = ]T h k z* k . (1) 

k=<S>m 

We say that filter ft is balanced if H( 1) = 1. We define also an associated real 
sequence r] as r]k = 2 h q +kh q where the filter coefficients with indices outside 
range — m to m are defined to be zero. The real Lawton matrix [7] A associated 
with the filter is the (4m + 1) x (4m + 1) matrix: 



ft2 m 


0 


0 • 


•• 0 


0 


0 • 


•• 0 


0 


0 


V2m<g>2 ft2ra®l 


V2m • 


•• 0 


0 


0 • 


•• 0 


0 


0 


V2m 


ft2m®l 


ft2ra<g)2 ' 


•• ftl 


fto 


m • 


• • ft2ra<g)2 


ft2ra®l 


V2 m 


0 


0 


r] 2 m * 


• • ft3 


V2 


m • 


' * ft2ra(g)4 


ft2m®3 


ft2ra(g)2 


0 


0 


0 • 


•• 0 


0 


o • 


• ' ft2 m 


ft2m®l 


ft2ra<g)2 


0 


0 


0 • 


•• 0 


0 


o • 


•• 0 


0 


r] 2 m 



We define a pair of dual real finite filters to be a set of two real balanced 
filters (ft, ft) of length (2m + 1) and (2m + 1) respectively such that 

H ( e ie ) WJe^) + H (e i(e+7r) ) H (e i ( 0+5r )) = 1 for all 6. (3) 

Note that the overbar denotes complex conjugation. The following result is 
well known [4]: 



Stability of Biorthogonal Wavelet Bases in L^R) 119 

Stability Condition: A pair of dual real finite filters, (h,h), generate biorthog- 
onal Riesz bases of compactly supported wavelets iff the Lawton matrix associated 
with each filter has a simple eigenvalue at one and all remaining eigenvalues have 
modulus less than one. 

Lemma 1. A pair of dual real finite filters, (h, h), generate biorthogonal Riesz 
bases of compactly supported wavelets only if the sum of the elements in every 
column of the Lawton matrix associated with each of the filters is one. 

We call this necessary condition on the Lawton matrix associated with a 
balanced real filter the column sum condition. It transpires that the column 
sum condition corresponds to a simple condition on the filter itself [8]: 

Lemma 2. The Lawton matrix associated with a real balanced filter h of length 
(2m + 1) satisfies the column sum condition iff H{— 1) = 0. 

3 Lawton Symmetry 

Given a matrix A E C N M with coefficients aij let matrix Ar*B C n m be 
defined by [A~fij = dw+i®i,M+i<g)j for all i, j. Subject to this definition a matrix A 
is said to be Lawton symmetric if A = A~7 It is not difficult to show that the 
Lawton matrix associated with a real filter of length (2m + 1) is real and Lawton 
symmetric. We observe also the following result: 

Lemma 3. A real, (2 M + 1) x (2 M + 1), Lawton symmetric matrix, L, has the 
following structure: 



L = 



A a B 
b T c ( bJ T 
BAar* A~ 



where A, B E R M M , a, b E R M and c E R. 



(4) 



Employing this result and defining w T = [1, , 1] E R 1 M , E E R M M 

such that Eij = Sm+i^j (where Sij denotes the Kronecker delta), we obtain 
the following: 



Lemma 4. The eigenvalues of a real, (2M + 1) x (2M + 1), Lawton symmetric 
matrix A satisfying the column sum condition may be classified as follows: 

1. One of them equals 1. 

2. A further M of them are the eigenvalues of the reduced order matrix (A— BE) 
with one of these being \ . 

3. The remaining M are eigenvalues of the reduced order matrix (A + BE — 
2 aw T ). 



120 Paul F. Curran and Gary McDarby 



We call the eigenvalue at 1 the symmetric eigenvalue of type (1) and that at 
\ the skew- symmetric eigenvalue of type (1). The remaining M — 1 eigenvalues 
of the second class we call the skew- symmetric eigenvalues of type (2) and the 
eigenvalues of the third class we call the symmetric eigenvalues of type (2). The 
terminology is inspired by the readily established facts that symmetric eigen- 
values have associated eigenvectors which are symmetric , i.e. have the following 
form: 



Xi 

Xo 

Ex i 



for some vector x\ £ C M 1 and xq £ C 



(5) 



and similarly that skew- symmetric eigenvalues have associated eigenvectors 
which are skew- symmetric, i.e. which have the following form: 



y i 
0 

-Eyi 



for some vector yi G C M 1 



(6) 



4 Non- negat i veness 

Given any vector v G i? 2M+1 , v will be said to be non-negative , denoted v > 0, 
if: 



Re (V (e ie )) + Im (V (e ie )) > 0 V6>g[0,2tt] (7) 

where V (z) denotes the z-transform of v as above. All subsequent references 
to non-negative vectors are understood to be in this sense. 

Krein and Rutman [9] define a convex cone in a finite dimensional, real vector 
space to be a subset, C, of the vector space having the following properties: 

1. If x G C then ax G C for all scalars a > 0. 

2. If x, y G C then x + y G C. 

3. If x, y G C then x + y ^ 0. 

4. C is closed relative to the standard Euclidean norm-topology on the vector 
space. 

Consider the set of all real, (2M + l)xl non-negative vectors: 

K = {v £ R 2 M+ 1 \v> 0} . (8) 

The set K has two properties that prove to be significant in the study of 
Lawton matrices: 

Lemma 5. K is a convex cone (in the sense of Krein and Rutman). 

Lemma 6. K + (-K) = R 2M+1 . 



Stability of Biorthogonal Wavelet Bases in L, 2 (R) 121 



The previous results permit a number of corollaries. Let C = {v £ 
R 2M+1 \ V = v - }, i.e. C is the set of all real, Lawton symmetric, (2 M + 1) x 1 
vectors. 

Corollary 1. 

1 . C is a subspace of R 2M+1 . 

2. A real Lawton symmetric matrix maps C into itself. 

3. K n C is a convex cone in C. 

I (ifn£) + (-ifn£) = £. 

Let Z 0 = {v e R 2M+1 | [1, . . . , l\v = 0, [M, . . . , 1, 0, —1, . . . , —M]v = 0}. 

Corollary 2. 

1. Zq is a subspace of R 2M+1 . 

2. A real Lawton symmetric matrix that satisfies the column sum condition 
maps Zq into itself. 

3. (K n Z 0 ) is a convex cone in Zq. 

I {Knz 0 ) + {-Knz 0 ) = z 0 . 



Corollary 3. 

1. Zq n C is a subspace of R 2MJrl . 

2. A real Lawton symmetric matrix that satisfies the column sum condition 
maps Z 0 nC into itself. 

3. K n Zq n jC is a convex cone in Zq D C . 

4 . (K n z 0 n c) + (-K n z 0 n c) = z 0 n c. 



One further property of non-negative, real vectors that will be required in 
our subsequent discussion of Lawton matrices may be stated as follows: 

Lemma 7. There exists no non-zero , real , skew- symmetric, non-negative vector 
in R 2M +\ 

Corresponding to the definition of non-negative, real vectors given above 
we now propose a definition of non-negative, real matrices. Given any matrix 
L E i^( 2M+1 ) ( 2M +!) we sa y that L is non-negative, denoted L > 0, if Lv > 0 
for all v > 0 in R 2M + l . 

A significant feature of Lawton matrices associated with real, finite filters is 
that they are non-negative. This observation is formally stated as follows: 

Lemma 8. The Lawton matrix, A, associated with a real filter of length (2m+l) 
is non-negative. 




122 Paul F. Curran and Gary McDarby 



In terms of the cones introduced previously lemma 8 asserts that a Lawton 
matrix associated with a real, finite filter of length (2 m + 1) defines a linear 
operator on real vector space R 4m+1 which maps the convex cone K (with M = 
2m) into itself. There exist some elementary, but important, corollaries to this 
result: 

Corollary 4. By restriction, a Lawton matrix associated with a real, finite filter 
defines linear operators on real vector spaces C, Z 0 , ZoPuC, which map the convex 
cones (Kn£), (KnZo), ( KHZ^nC ) (with M = 2m) respectively into themselves. 



5 Generalised Frobenius-Perron Theory 

In their celebrated treatise, Krein and Rutman [9] present a generalisation of 
the classical Frobenius-Perron theorem which we may paraphrase as follows: 

Theorem 1. Let C be a convex cone with non-null interior in a real, finite- 
dimensional vector space ; if a linear mapping Q maps C into itself and is not 
nilpotent, then there is a real, positive eigenvalue A c of Q with an associated 
eigenvector lying in C, having the property that no other eigenvalue of Q has 
modulus exceeding \q- 

By employing the results of section 4 together with theorem 1, we may make 
a number of assertions concerning Lawton matrices associated with real, finite 
filters. 

Lemma 9. Let A be a Lawton matrix associated with a real, finite, balanced 
filter which satisfies the column sum condition, then there exists a real, positive 
eigenvalue, L, of A such that: 

(i) all remaining eigenvalues of A have modulus less than or equal to L, 

(ii) there exists a real, non-negative eigenvector, ^(l), associated with L. 



Lemma 10. Let A be a Lawton matrix associated with a real, finite, balanced 
filter which satisfies the column sum condition, then there exists a real, positive 
symmetric eigenvalue, S, of A such that: 

(i) all remaining symmetric eigenvalues of A have modulus less than or equal to 

.. 

(ii) there exists a real, non-negative eigenvector, V($)> associated with S. 



Lemma 11. Let A be a Lawton matrix associated with a real, finite, balanced 
filter which satisfies the column sum condition, then there exists a real, positive 
eigenvalue, p, of A which is either symmetric of type (2) or skew- symmetric of 
type (2) such that 



Stability of Biorthogonal Wavelet Bases in L> 2 (R) 123 



(i) all remaining symmetric and skew- symmetric eigenvalues of type (2) of A 
have modulus less than or equal to p, 

(ii) there exists a real, non-negative eigenvector, v^, associated with p. 



Lemma 12. Let A be a Lawton matrix associated with a real, finite, balanced 
filter which satisfies the column sum condition, then there exists a real, positive 
eigenvalue, a, of A which is symmetric of type (2) such that: 

(i) all remaining symmetric eigenvalues of type (2) of A have modulus less than 
or equal to a, 

(ii) there exists a non-negative eigenvector, V( a ), associated with a. 



Lemma 7 permits us to make a number of observations concerning the eigen- 
values L, S, p and a of lemmas 9-12. The proof of these observations is included 
to indicate the utility of lemma 7. 

Lemma 13. Eigenvalue L is symmetric and equals eigenvalue S. Eigenvalue p 
is symmetric of type (2) and equals eigenvalue a. 

Proof Lemma 9 assures that is real, non-zero and non-negative. By lemma 7 
this vector cannot, therefore, be skew- symmetric. Hence eigenvalue L cannot be 
skew- symmetric and must, therefore, be symmetric. It is now trivial to show 
that L = S. 

Lemma 11 assures that is real, non-zero and non- negative. As above, 
lemma 7 asserts that this vector cannot be skew-symmetric and, therefore, that 
eigenvalue p cannot be skew- symmetric. Hence p must be symmetric of type (2) 
and it is now trivial to show that p = a. 

Note that the eigenvalue <r, of lemma 12, is uniquely defined by the Lawton 
matrix (and hence by the real filter associated with it). We are finally in a 
position to state and prove the primary result of this investigation: 

Theorem 2. The Lawton matrix associated with a real, finite, balanced filter 
satisfying H(— 1) = 0 has a simple eigenvalue at one and all remaining eigen- 
values have modulus less than one iff the particular eigenvalue a is less than 1. 



Proof The conditions imposed imply that the associated Lawton matrix is real 
and satisfies the column sum condition. Hence, the division of eigenvalues into 
symmetric eigenvalues of types (1) and (2) and skew- symmetric eigenvalues of 
types (1) and (2) is valid. 

If a is greater than 1 then the Lawton matrix has a real, symmetric eigenvalue 
of type (2) greater than 1. It follows that the Lawton matrix does not satisfy 
the eigenvalue condition stated in the theorem. 

If a is equal to 1 then the Lawton matrix has a real, symmetric eigenvalue 
of type (2) equal to 1. Of course it also has a real, symmetric eigenvalue of type 



124 Paul F. Curran and Gary McDarby 



(1) equal to 1. Hence the matrix has an eigenvalue at 1 of algebraic multiplicity 
greater than or equal to 2. It follows the Lawton matrix does not satisfy the 
eigenvalue condition. 

If a is less than 1 then, by lemma 12, all of the symmetric eigenvalues of type 

(2) of the Lawton matrix have modulus less than or equal to cr, i.e. less than 
1. By lemma 13, p = a , hence, by lemma 11, the skew-symmetric eigenvalues 
of type (2) of the associated Lawton matrix also have modulus less than 1. The 
skew-symmetric eigenvalue of type (1) equals \ and clearly has modulus less than 
1. Of course the symmetric eigenvalue of type (1) equals 1. Hence the Lawton 
matrix satisfies the eigenvalue condition. 

Note: the advantage of theorem 2 is that it permits us to test whether a 
Lawton matrix has a simple eigenvalue at one and all other eigenvalues of mod- 
ulus less than one, not by checking all of the eigenvalues, but rather by testing 
a single eigenvalue a which is known to be real, non-negative and symmetric of 
type (2). These known properties of a significantly simplify the numerical task 
of finding this eigenvalue. 

6 The Lifting Scheme 

We outline a single parameter form of the lifting scheme as follows: 

Theorem 3. Take any initial set of real finite, balanced dual filters {h,h}, i.e. 
filters satisfying the biorthogonal constraint (3). Assume that these filters gener- 
ate biorthogonal Riesz bases of compactly supported wavelets. Define companion 
filters g and g as follows: 

G (e ie ) = e® l0 H (e^+O) , G ( e ie ) = e® id H (eW+*)) (9) 

then a new set of finite balanced filters {h,h new }, together with their com- 
panion filters {g,g new } ; are generated as follows: 



ff new = ff ( e i0) + r Q ) S (e i2 °) 

G new (jOj = Q _ rH S (e i29 ) (10) 

where S (e 20 ) is a real trigonometric polynomial and r is a real parameter. 
These new filters also satisfy the biorthogonal constraint, i.e. are dual. 

The question arises as to whether, for a given real trigonometric polynomial S 
and real parameter r the dual filters {h, h new } generate biorthogonal Riesz bases 
of compactly supported wavelets. A simple necessary condition [8] is stated as 
follows: 

Lemma 14. The dual filters {h,h new } generate biorthogonal Riesz bases of 
compactly supported wavelets only if S( 1) = 0. 



Stability of Biorthogonal Wavelet Bases in L 2 (R) 125 



The principle contribution of the present work (theorem 2) leads directly to 
the following result: 

Theorem 4. Assuming 5(1) = 0 the dual filters {h,h new } generate biorthogo- 
nal Riesz bases of compactly supported wavelets for all real r in an open interval 
containing 0. Moreover, this interval is characterised by the facts that it is maxi- 
mal and that at any boundary points, but at no interior points, the Lawton matrix 
associated with Ji new has a symmetric eigenvalue of type (2) equal to 1. 

By reference to lemmas 3 and 4 it is clear that the Lawton matrix associated 
with h new has a symmetric eigenvalue of type (2) equal to 1 iff det(I — (A + 
BE — 2 aw T )) = 0 where / is the identity matrix. It is elementary to show that 
the coefficients of the matrix (/ — (A + BE — 2 aw T ) are quadratic polynomials 
in the variable r. Consequently the evaluation of values of r for which this 
determinant equals zero is a special case of the well-known quadratic eigenvalue 
problem [10]. By means of the standard method of linearisation [10] this problem 
may in general be converted to the problem of determining the eigenvalues of a 
matrix of twice the dimension. Specifically let 



(/ — (A + BE - 2 aw T ) = I - C 0 - rC x - t 2 C 2 (11) 

for suitable constant, real matrices Co , C \ , C 2 . Then, assuming (I — Co) is 
non-singular, det(I — (A + BE — 2 aw T )) = 0 for non-zero parameter value r iff 
(1 /t) is an eigenvalue of the higher order matrix 



q 



0 I 

C 2 (/-C o ) 01 C^I-Co )® 1 



( 12 ) 



Employing these observations yields a corollary to theorem 4 comprising a 
more readily tested stability condition. 



Corollary 5. if 5(1) = 0 and if (/ — Co) is non-singular, dual filters {h, h new } 
generate biorthogonal Riesz bases of compactly supported wavelets for all real r 
in an open interval containing 0. Moreover, if they exist, the upper bound of 
this interval equals the reciprocal of the real, positive eigenvalue of Q of greatest 
modulus and the lower bound of this interval equals the reciprocal of the real, 
negative eigenvalue of Q of largest modulus. 



Although corollary 5 calls for inversion of matrix (I — Co) and determination 
of eigenvalues of the potentially large matrix Q, numerical implemenation is 
facilitated by two observations: (i) (I — Co) is in general highly structured so 
that its inversion requires relatively little numerical effort, (ii) one does not seek 
all eigenvalues of matrix Q, but rather the largest real positive and largest real 
negative eigenvalues only. 



7 Example 

To initialise the lifting scheme select the Haar filters h = [0, |, |] = h and their 
companion filters g = [0, — -|] = g . It is readily shown that filters h, h satisfy 



126 Paul F. Curran and Gary McDarby 



the biorthogonal constraint (3). Note that filters h, h are real, finite and balanced 
and that H(— 1) = H(—l) = 0. They comprise a dual real finite pair of filters. 
The Lawton matrix associated with both filters is: 



0 0 0 0 0 
1|000 
0 i 1 i 0 
0 0 0 11 
0 0 0 0 0 



(13) 



It satisfies the column sum condition and has eigenvalues 0,0, 1, One 
eigenvalue is 1. It is simple and strictly exceeds all other eigenvalues in modu- 
lus. It follows from [4] that filters {h, hj generate biorthogonal Riesz bases of 
compactly supported wavelets. We apply the single parameter form of the lifting 
scheme using the fixed real trigonometric polynomial: 



S (e iG ) = [—e i6 + e® iG ) 



which clearly satisfies S( 1) = 0. The new filters become: 



(14) 



j^riew 



o, 



11] 




11] 


0,0,0, -,-,0,0 


, 9 = 


0,0,0, --,-,0,0 



T T 1 1 T T 

2 ’ 2 ’ 2 ’ 2 ’ 2 ,_ 2 



T T 11 T T 

2 ’ 2 ’ 2 ’ 2 5 2 ’ 2 



• (15) 



The Lawton matrix associated with filter h new is, of course, Lawton sym- 
metric. As this matrix is 13 x 13 we elect not to write it out in full. However, 
by comparing with the canonical structure of lemma 3 we can identify the sub- 
matrices: 



0 


0 


0 




0 


0 




0 




" 0 " 


®r 2 


-r 2 

2 


0 




0 


0 




0 




0 


0 

1 + 2 T 2 


®t+4 


<g)T 2 

0 


X 2 

2 

(g.r+4 


0 

<g)T 2 




0 

X 2 

2 


, a = 


0 

0 


0 


i+T0T 2 


l + 2r 2 


i 


+ T(g)T 2 


0 


<g>T+V 




<8>r 


2 


<8>t 2 


®T+^- 


0 


i 


+ T0T 2 


1+2t 2 


TJ+T0T 2 




_ 0 


- 




0 


0 


0 


0 


0 


0 






0 








0 


0 


0 


0 


0 


0 






-r 2 

2 






B = 


0 


0 


0 


0 


0 


0 






<g)T S 






0 


0 


0 


0 


0 


0 




5 


o = 


0T+- 


r 2 

2 






X 2 

2 


0 


0 


0 


0 


0 






0 








>T+ + 


) <g>T 2 


X 2 

2 


0 


0 


0 








i2 £ 





(16) 



(17) 



c = 1 + 2r 2 . 



(18) 



Stability of Biorthogonal Wavelet Bases in L> 2 (R) 127 



The symmetric eigenvalues of type (2) are the eigenvalues of the reduced 
order matrix (A + BE — 2 aw T ) = 



0 


0 


0 


0 


0 


0 


®r 2 


X 2 

2 


0 


0 


0 


0 


0 


<g>r+4 


<g)T 2 


X 2 

2 


0 


0 


1+2 t 2 


^■+T(g)T 2 


0 


®T+^ 


®r 2 


-r 2 

2 


2t 2 


|+T(g)T 2 


1+4t 2 


i+ r + r2 


2r 2 


(S>t+3t 2 


®t 2 


<g>r+4 


0 




1+r 2 


+ 



The matrix has a symmetric eigenvalue of type (2) equal to 1 iff det(I — (A + 
BE — 2 aw T )) = 0 where I is the identity matrix. With reference to (11) we note 
that in the present case 



I -Co 



1 0 0 0 0 0 

0 1 0 0 0 0 

0 0 1 0 0 0 

-1 0 1 0 0 

o <8>| — l l o 

0 0 0 <g>J —1 \ 



( 20 ) 



Not only is this matrix non-singular, it is also lower triangular and therefore 
readily invertible. In all cases examined by the authors the matrix (I — Co) turns 
out, not only to be non-singular, but also to be sparse and readily invertible. It is 
therefore feasible to construct matrix Q of (12) and to determine its eigenvalues. 

In the present case, however, we do not actually need to employ linearisation. 
It is feasible to apply theorem 4 directly since det(I— (A-\-BE — 2aw T )) is readily 
shown to equal the polynomial in r: 



\ (l-y) ( 1+r2 ) (1 + 2r - 8r 2 ) (1 + r) (21) 

whose roots are: ±\/2, — 1, — \, The maximal real open interval con- 

taining 0 with boundary points, but no interior points, in this set is given by 
— j < r < Hence, for any value of r between — j and \ the resulting filters 
{h, h new } generate biorthogonal Riesz bases of compactly supported wavelets. 



8 Conclusions 

We have formulated a single parameter form of the lifting scheme. We have 
shown that the scheme generates biorthogonal filter banks having associated 
wavelets in 1/2(77) provided the parameter lies in a certain open interval and 
have developed a method for finding the largest such interval. Numerically this 
method is equivalent to a special case of the quadratic eigenvalue problem. For 
low order filters the method of linearisation is in general appropriate. A single 
matrix inversion is required in the application of the linearisation method which 
reduces the problem to a standard eigenvalue problem (or rather to the problem 



128 Paul F. Curran and Gary McDarby 



of finding the largest and smallest non-zero, real eigenvalues of a matrix). It tran- 
spires that the matrix inversion often requires relatively little numerical effort. 
For high order filters more advanced techniques for solving the quadratic eigen- 
value problem (e.g. the Jacobi-Davidson method) would be required. We note 
that the parameterised lifting scheme, in conjunction with this method, yields 
a class of biorthogonal filter banks with associated wavelet bases in L2 (R) and 
that this class is itself parameterised. Clearly one may employ a stochastic algo- 
rithm to determine the filter bank in this parameterised class which is optimal 
with respect to some desirable property (such as maximum energy compaction, 
desired shape, etc.). 

References 

1. I. Daubechies, I.: Orthonormal Bases of Compactly Supported Wavelets. Comm. 
Pure Applied Math. 41 (1988) 909-996 117 

2. Mallat, S.: A Theory for Multiresolution Signal Decomposition: The Wavelet Rep- 
resentation. IEEE Transaction on Pattern Analysis and Machine Intelligence 11 
(1989) 674-693 117 

3. Cohen, A., Daubechies, I., Feauveau, J. C.: Bi-orthogonal Bases of Compactly 
Supported Wavelets. Comm. Pure Applied Math. 45 (1992) 485-560 117 

4. Cohen, A., Daubechies, I.: A Stability-Criterion for Biorthogonal Wavelet Bases 
and their Related Subband Coding Scheme. Duke Mathematical Journal 86 (1992) 
313-335 117, 118, 126 

5. Strang, G.: Eigenvalues of (f 2 )H and convergence of the cascade algorithm. IEEE 
transactions on signal processing 44 (1996) 233-238 117 

6. Sweldens, W.: The Lifting Scheme: A Custom-Design Construction of Biorthogonal 
Wavelets. Appl. Comput. Harmon. Analysis 3 (1996) 186-200 117 

7. Lawton, W. M.: Necessary and Sufficient Conditions for Constructing Orthonormal 
Wavelet Bases. Journal Math. Phys. 32 (1991) 57-61 118 

8. McDarby, G., Curran, P., Heneghan, C., Celler, B.: Necessary Conditions on the 
Lifting Scheme for Existence of Wavelets in L 2 (R). ICASSP, Istanbul, (2000) 119, 
124 

9. Krein, M. G., Rutman, M. A.: Linear Operators Leaving Invariant a Cone in a 
Banach Space. Functional Analysis and Measure Theory. American Mathematical 
Society, Providence R. I., 10 Translation Series 1 (1962) 199-325 120, 122 

10. Gohberg, I., Lancaster, P., Rodman, L.: Matrix Polynomials. Academic Press, New 
York, (1982) 125 



Characterization of Dirac Edge 
with New Wavelet Transform 



Lihua Yang 1 , Xinge You 2 , Robert M. Haralick 3 , 

Ihsin T. Phillips 4 , and Yuan Y. Tang 2 

1 Department of Mathematics, Zhongshan University 
Guangzhou 510275, P. R. China 

2 Department of Computer Science, Hong Kong Baptist University 
Kowloon Tong, Hong Kong 
{yytang , xyou}@comp . hkbu . edu . hk 

3 Department of Computer Science, Graduate Center, City University of New York 
365 Fifth Ave., New York, NY 10016, USA 
haralick@gc . cuny . edu 

4 Department of Computer Science, Queens College, City University of New York 
65-30 Kissena Blvd., Flushing, NY 11367 USA 
yun(§ image .cs.qc.edu 



Abstract. This paper aims at studying the characterization of Dirac- 
structure edges with a novel wavelet transform, and selecting the suit- 
able wavelet functions to detect them. Three significant characteristics 
of the local maximum modulus of the wavelet transform with respect to 
the Dirac- structure edges are presented. By utilizing a novel continuous 
wavelet, it is proven that the local maxima modulus of such continuous 
wavelet transform of a Dirac-structure edge forms two new curves which 
are located symmetrically at the two sides of the original one and have 
the same direction with it and the distance between the two curves is 
estimated. An algorithm to detect curves in an image by utilizing the 
above invariants is developed. Several experiments are conducted, and 
positive results are obtained. 



1 Introduction 

In our previous paper [7], we presented a novel method based on the quadratic 
spline wavelet, to identify different structures of edges, and thereafter, to extract 
the Dirac-structure ones. Furthermore, a very important characterization of the 
Dirac-structure edges by wavelet transform was provided. Three significant char- 
acteristics of the local maximum modulus of the wavelet transform with respect 
to the Dirac-structure edges were presented, namely: (1) slope invariant: the 
local maximum modulus of the wavelet transform of a Dirac-structure edge is 
independent on the slope of the edge. (2) grey-level invariant: the local maximum 
modulus of the wavelet transform with respect to a Dirac-structure edge takes 
place at the same points when the images with different grey-levels are to be 
processed. (3) width light-dependent: for various widths of the Dirac-structure 

Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 129 138, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



130 



Lihua Yang et al. 



edge images, the location of maximum modulus of the wavelet transform varies 
slightly when the scale s of the wavelet transform is larger than the width d of 
the Dirac-structure edge images. Based on the characteristics, a novel algorithm 
to detect the Dirac-structure edges from an image has been developed. Some 
examples of applying this algorithm to detect the Dirac-structure edge can be 
found in [7]. However, there are some weaknesses in the method. This paper 
proposes a great improvement of that work. 

Noted the foregoing third property in [7], it says “width light-dependent”, 
does not say “width invariant” . This means that for various widths of the Dirac- 
structure edge images, the location of maximum modulus of the wavelet trans- 
form may change. These changes are small. What we want is that the location of 
maximum modulus does not change, i.e. the location of maximum modulus has 
the property of width invariant. Let us look at Fig. 1. The first row of Fig. 1 has 
three circles. The left image is the original one which contains a circle with var- 
ious width. The middle one is the location of maximum modulus of the wavelet 
transform with scale 8 = 6, which depends on the width of the circle in some 
way. Finally, by utilizing the algorithm proposed in [7], the central line of the 
circle is extracted and displayed on the right of Fig. 1. We can find that the 
central line of the circle is broken. The second row of Fig. 1 has trees, where 
the sizes of the branches vary, some are thick and some are thin. The left image 
is the original one, and the right of Fig. 1 is the central line extracted utilizing 
the algorithm proposed in [7]. It is easy to see that some branches of the tree 
are lost. To overcome such a defect, In this paper, a novel wavelet is utilized, so 




Fig. 1 . Left: the original images; Middle: the location of maximum modulus of 
the wavelet transform with 8 = 6; Right: the central line images extracted by 
the algorithm of [7] 



that the above “width light-dependent” properties can be improved to “width 
invariant” without losing the “slope invariant” and “grey- level invariant”. Due 
to this improvement, the detection of curves is more accurate. 

This paper is organized as follows: Section 2 will be a brief review of the scale 
wavelet transform followed by the construction of a special wavelet. In Section 3, 
a characterization of the Dirac-structure edges in an image by wavelet transform 



Characterization of Dirac Edge with New Wavelet Transform 



131 



will be developed. Then, In Section 4, an algorithm to extract the central line 
of a curve will be presented, and several experiments will be illustrated. At last, 
some conclusions will be provided in Section 5. 



2 Continuous Wavelet Transform with New Wavelet 
Function 



Let L 2 (R 2 ) be the Hilbert space of all the square- integr able 2-D functions on 
plane f? 2 , ^ G L 2 (R 2 ) is called a wavelet function, if 



/ / i’ix, y)dxdy = 0 , 
JR JR 



( 1 ) 



For / G L 2 (R 2 ) and scale s > 0, the scale wavelet transform of f(x, y) is defined 
by 



W s f(x,y) := (/* if> a )(x,y) 

= [ [ — —)dudv, ( 2 ) 

Jr Jr s s s 

where * denotes the convolution operator, and ^ s {u^v) := ^ 0 ( 7 , 7 ) • Obviously, 
the scale wavelet transform described in Eq. (2) is a filter, and since 0 E L 2 (i? 2 ), 
its Fourier transform can be defined by 0 (£, 77 ) := f R f R 00c, y)e® l ^ x+r]y ^ dxdy 
which satisfies the condition of 0 E L 2 (R 2 ). Thus, both functions ip and -0 
decrease at infinity. For a general theory of the scale wavelet transform, it can 
be found in [2,3]. 

However, the wavelet transform differs from Fourier transform. There is only 
one basic function in the latter, while there exist many different wavelet functions 
in the former. Therefore, it is very important to select the one that is as “good” 
as possible according to its particular applications. 

Theoretically, Eq. (1), i.e. 0(0,0) = 0, implies that 0(x,?/) is a band-pass 
filter, but a high-pass one because of the decrease of its Fourier transform at 
infinity. It is easy to see that the partial derivatives of a low-pass function can 
become the candidates of the wavelet functions. In this paper, we consider such 
kind of wavelets, i.e., 0 1 (x,^/) := -^0{x^y) , 0 2 (x,^/) := -j^6(x,y) where 0(u, v) 
denotes a real function satisfying: 1 )0(u,v) fast decreases at infinity; 2 )0(u,v) is 
an even function on both u and v\ 3)0(0, 0) = 1. 

For wavelet 0 1 (x,t/) defined above, its scale wavelet transform is 

WR{x,y) =s-^-(f* 9 s )(x,y) 



where 6 s (x,y) := , f). 

This formula is equivalent to the classical multi-scale edge detection [1,5], if 
Q(x, y) is set to be a Gaussian. A similar explanation for wavelet 0 2 (x, y) defined 



132 



Lihua Yang et al. 



above can be made. However, the partial derivative is along the vertical direction 
instead of the horizontal one. 

Gaussian function has been employing in image processing. It possesses some 
excellent properties, such as, the locality in both the time domain and frequency 
domain, the same widths in both the time-window and frequency-window, and 
so on. All these properties make it applied extensively and deeply in the area 
of the filtering, and it already almost becomes the best candidate of low-pass 
filter in practice. Unfortunately, Gaussian function is not always the best one 
for all applications. In fact, we have shown that it is not the best candidate for 
characterizing a Dirac-structure edge [7]. Even the quadratic spline wavelet is 
better than it, although, the quadratic spline wavelet is still not a perfect one 
for such applications. In [7] it has been proved that the location of maximum 
modulus of the wavelet transform with respect to a Dirac-structure edge is not 
width invariant. It still depends on the width of the edge even though it depends 
lightly. To avoid such dissatisfaction, a novel wavelet is constructed and used in 
this paper, and its definition is described below. 

Let 

i>i(x) = -|(-8a;ln 1+ - 16x 2 ) 

< M*) = -|(^ln 3+ ~ JY 9 ~ 16a;2 ) 

k i>z{x) = — ^(—4x In 1+ y* 2 + |Vl - Z 2 ) 

Then, the 1-D wavelet 'ip(x) is an odd function defined on (0, oo) by 



' 4>l{x) + Ip 2 (x) + ip 3 (x) 



ip(x) := < 



ip 2 (x) + foix) 

ip 3 (x) 



0 

V 



x e (0, i) 

* e [|, l) 

x E [1, oo) 



( 3 ) 



Let (f)(x) := f* ip(x)dx. Then <j){x) is an even function, compactly supported on 
[-1, 1], and (j)~Xx) = ^(x). The smoothness function 6(x,y) is then defined by 
9(x,y) := x 2 + y 2 ), and the 2-D wavelets are defined by 



( ^(x^) := ^0(x,y) = + V 2 ) i = 2 

J V x +y (a) 

y i> 2 (x,y) := -§^6(x,y) = cp^V^+V 2 ) J +y2 • 
and are illustrated in Fig. 2. 

The gradient direction and the amplitude of the wavelet transform are de- 
noted respectively by 



VW s f{x,y) 



fWjf(x,y)\ 



( 5 ) 



and 



|VW s /(x, y)\ := VWUi^WTWIIi^yW- 



(6) 



Characterization of Dirac Edge with New Wavelet Transform 




Fig. 2. The graphical descriptions of 2-D wavelet functions: left - function ip 1 (x,y); 
right - function lifxey) 



By locating the local maxima of \VW s f(x,y)\, we can detect the edges of the 
images. 



3 Characterization of Curves through New Wavelet 
Transform 

In this section, three significant characteristics of the local maximum modulus 
of the wavelet transform with respect to the Dirac- structure edges in images will 
be presented, namely: 

— Grey- level invariant: the local maximum modulus of the wavelet transform 
with respect to a Dirac- structure edge takes place at the same points wdien 
the images with different grey- levels are to be processed. 

— Slope invariant: the local maximum modulus of the wavelet transform of a 
Dirac- structure edge is independent on the slope of the edge. 

— Width invariant: for various widths of the Dirac- structure edges in an image, 
the location of maximum modulus of the wavelet transform does not vary 
under certain circumstance. 

The proof of the above characteristics may be obtained similarly to our previous 
work j?]. 

However, it concluded mathematically that the amplitude of of the wavelet 
transform j VW s /(x, y)\ reaches the local maximum if and only if the scale s > d. 
Namely, the local maxima of \STW s fi d (x p , y p )\ arrive at both sides of the central 
line l of Id and the distance from l i s ™, which is independent on the wfidth d, In 
summary, The above three invariance properties can be revcritten as the following 
theorem: 

Theorem 1. Let Id he a Dirac- structure edge with width d and, l he its central 
line . The local maxima modulus of the vjavelet transform corresponding to the 
wavelets of Eq. (,l J forms two new lines which are located symmetrically on both 
sides of the central line, and have the same direction with it. If scale s > d, then 
the distance between the two new ones equals to s. 




134 



Lihua Yang et aL 




Fig. 3. Modulus of wavelet transforms corresponding a segment of straight line and a 
curve 



This theorem describes the property of width- invariance, which is important. It 
improves our former results in [?]. Namely, for each scale s, the local maximum 
moduli of the wavelet trams for ms with respect to the curves of different widths 
are located at the same positions. A couple of graphical examples are shown in 

Fig. 3. 

4 Algorithm and Experiments 

In this section, the algorithm for extracting the Dirac- structure edges will be 
presented. Several experiments will also be conducted. 

4.1 Algorithm 

In practice, the vcavelet transform should be calculated discretely. We have the 
following formula: 



r i 



W^f(n^rn)— / j f(u,v)wl(n-u,m----v)dudv 



»k -\- 1 r l - fi 



ip] (n — ip rn v)dudv 



V 

= Yl'f ( ' n ~ k ~ l ' rn ~ l "" :1 )^M > (* = C 2), 

k,l 

vffiere hl ! 1 y>l(u, v)dudv — ii/ s v)dudv, (i — 

1,2). Next, w r e give the calculating formulae of the coefficients The details 

can be found in [i,d,7]. It is deduced easily that 



/M !S,2 



isA ; S. 1 ! sA >sA ;S.l I sA 

y-’-ki = -Vk- i.i v fc :_i - v k ,_i, = - w-i i_i- 



Characterization of Dirac Edge with New Wavelet Transform 



135 



Through further calculating, we have 1 + $ 1 + 1 , k ~ 0f+i,fe+i _ f° r 

all non-negative integers k and Z, (f) kl = fl fe2+ ^ 2 ^ — (Z/s) 2 ij)(v)dv. 

On the other hand, it is easy to see that (j) s k z = 0 for all integers fc, Z satisfy- 
ing k 2 + Z 2 > s 2 due to the compact support [—1,1] of (p(x). we can calculate 



Table 1. The nonzero coefficients {<fi k for s = 2 



l\k 


1 = 0 


l = 1 


O 

II 


0.2500 


0.1250 


k = 1 


0.0497 


0.0111 



all the coefficients (j) s kl numerically for non- negative integers k,l. The possible 
nonzero items of (j) s kl for s = 2, 4, 6, 8 are listed in Tables 1-4. Based on the 



Table 2. The nonzero coefficients {<j) s k z } for s = 4 



k\l 


l = 0 


1 = 1 


l = 2 


l = 3 


0 

II 


0.2500 


0.2292 


0.1250 


0.0208 


k = 1 


0.1468 


0.1206 


0.0552 


0.0060 


k = 2 


0.0497 


0.0366 


0.0111 


0.0003 


CO 

II 


0.0047 


0.0026 


0.0002 


0 



characterization of a straight line in an image developed in Section 3, an algo- 
rithm to detect straight lines in an image can be designed. The result is also 
valid for general curves since a short segment of a curve can be regarded as 
a straight line approximately. In fact, wavelet transforms are essentially local 
analysis. Therefore the result of Theorem 1 can be applied to the general curves 
in an image. Our algorithm to detect curves in an image is designed as follows. 



Table 3. The nonzero coefficients {(j) s k L } for s = 6 



k\l 


1 = 0 


l = 1 


l = 2 


1 = 3 


1 = 4 


1 = 5 


k = 0 


0.2500 


0.2438 


0.2022 


0.1250 


0.0478 


0.0062 


k = 1 


0.1831 


0.1718 


0.1333 


0.0767 


0.0257 


0.0025 


k = 2 


0.1106 


0.1003 


0.0723 


0.0367 


0.0094 


0.0005 


Je- 

ll 

CO 


0.0497 


0.0436 


0.0281 


0.0111 


0.0017 


0.0000 


k = 4 


0.0133 


0.0109 


0.0056 


0.0014 


0.0000 


0 


k = 5 


0.0011 


0.0008 


0.0002 


0.0000 


0 


0 



136 



Lihua Yang et al. 



Table 4. The nonzero coefficients {<; t> s k t } for s = 8 



k\l 


1 = 0 


l = 1 


l = 2 


1 = 3 


l = 4 


1 = 5 


1 = 6 


l = 7 


o 

II 


0.2500 


0.2474 


0.2292 


0.1849 


0.1250 


0.0651 


0.0208 


0.0026 


k = 1 


0.2006 


0.1950 


0.1741 


0.1358 


0.0884 


0.0433 


0.0126 


0.0013 


k = 2 


0.1468 


0.1403 


0.1206 


0.0902 


0.0552 


0.0244 


0.0060 


0.0004 


k = 3 


0.0935 


0.0882 


0.0733 


0.0517 


0.0287 


0.0107 


0.0020 


0.0000 


k = 4 


0.0497 


0.0462 


0.0366 


0.0236 


0.0111 


0.0032 


0.0003 


0 


k — 5 


0.0199 


0.0180 


0.0132 


0.0072 


0.0026 


0.0004 


0.0000 


0 


k = 6 


0.0047 


0.0041 


0.0026 


0.0011 


0.0002 


0.0000 


0 


0 


r- 

II 


0.0004 


0.0003 


0.0001 


0.0000 


0 


0 


0 


0 



Algorithm 1 Let f{x^y) be an image containing curves. For a scale s > 0, 

Step 1 Calculate all the wavelet transforms {W}f(x,y) J Wgf(x,y)} with re- 
spect to the wavelets defined by Eq.(4). 

Step 2 Calculate the local maxima fi OC max of \^W s f(x,y)\ and the gradient 
direction f gradient • 

Step 3 For each point (x, y) with local maximum, search the point whose dis- 
tance along the gradient direction from (x,y) is s. If it is a point of local 
maxima, the center point is detected. 

Step 4 The curves formed by all the points detected in Step 3 are what we 
need. 



4.2 Experiments 

Let us turn back to Section 1, and look at Fig. 1. The particular task is that 
we are required to extract the central line of the circle with various widths. 
Unfortunately, as we have shown in Section 1, the algorithm based on the spline 
wavelet in [7] can not work well due to the width dependence of the detection. 
Fortunately, as described in detail in Section 3, the new method developed in 
this paper possesses the width invariant, grey- level invariant as well as slope 
invariant According to these properties, the central line of the circle and tree in 
Fig. 1 can be extracted. After applying Steps 1 and 2 of the above algorithm to 
the original image as displayed on the left column of Fig. 1, the local maximum 
modulus of the wavelet transform with respect to them can be computed and 
presented on the middle column in Fig. 1. At last, the central lines are extracted 
using Steps 3 and 4 of the above algorithm, and presented on the right column 
in Fig. 1. Next, some interesting examples are shown. In Fig. 5, the left image 
consist of a face with various widths. By carrying out the algorithm of this paper, 
the central line is extracted, which is shown graphically on the right in Fig. 5. For 
the Chinese character ” peace”, the original image, the maximum modulus image 
of the wavelet transform corresponding to s = 2 and the central line extracted 
by the proposed algorithm are shown respectively from the left to right in Fig. 6. 



Characterization of Dirac Edge with New Wavelet Transform 



137 




Fig. 4. Left: the original image; Middle: the location of maximum modulus of 
the wavelet transform corresponding to s = 6; Right: the central line extracted 
by the algorithm in this paper 




Fig. 5. Left: the original image; Middle: the maximum modulus image of the 
wavelet transform corresponding to s = 6; Right: the central line extracted by 
the proposed algorithm 





Fig. 6. Left: the original image; Middle: the location of maximum modulus of 
the wavelet transform corresponding to s = 2; Right: the central line extracted 
by the proposed algorithm 





138 



Lihua Yang et al. 



5 Conclusions 

We have improved our previous work [7] in this paper. By utilizing a novel 
wavelet, we have shown three significant characteristics of the local maximum 
modulus of the wavelet transform with respect to the the Dirac-structure edges, 
namely: 

— Slope invariant; the local maximum modulus of the wavelet transform of a 
Dirac-structure edge is independent on the slope of the edge. 

— Grey-level invariant: the local maximum modulus of the wavelet transform 
with respect to a Dirac-structure edge takes place at the same points when 
the images with different grey- levels are to be processed. 

— Width invariant, for various widths of the Dirac-structure edge images, the 
location of maximum modulus of the wavelet transform does not vary when 
the scale s of the wavelet transform is not less than the width d of the curve. 

Based on the invariance of the wavelet transform, an algorithm to extract the 
Dirac-structure edge by wavelet transform has been developed. Then several 
experiments have been conducted, and positive results have been obtained in 
this paper. 

References 

1. J. Canny. “A Computational Approach to Edge Detection”. IEEE Trans, on 
Pattern Analysis and Machine Intelligence , 8:679-698, 1986. 131 

2. C. K. Chui. An Introduction to Wavelets. Academic Press, Boston, 1992. 131 

3. I. Daubechies. Ten Lectures on Wavelets. Society for Industrial and Applied 
Mathemathics, Philadelphia, 1992. 131 

4. S. Mallat and W. L. Hwang. “Singularity Detection and Processing with Wavelets” . 
IEEE Trans. Information Theory , 38(2):617-643, March 1992. 134 

5. D. Marr and E. C. Hildreth. “Theory of Edge Detection”. In Proc. Roy. Soc ., 
pages 187-217, London B 207, 1980. 131 

6. Y. Y. Tang, Qi Sun, Lihua Yang, and Li Feng. “Two-Dimensional Overlap-Save 
Method in Handwriting Recognition” . In 6th International Workshop on Frontiers 
in Handwriting Recognition ( I WF HR’ 9 8 ), pages 627-633, Taejon, Korea, August 
12-14 1998. 134 

7. Y. Y. Tang, Lihua Yang, and Jiming Liu. “Characterization of Dirac-Structure 
Edges with Wavelet Transform”. IEEE Trans. Systems, Man, Cybernetics (B), 
30(1) :93— 109, 2000. 129, 130, 132, 133, 134, 136, 138 



Wavelet Algorithm for the Numerical Solution of 
Plane Elasticity Problem 



Youjian Shen 1 * and Wei Lin 2 ** 

1 Department of Mathematics, Zhongshan University and Hainan Normal University 

Haikou. 571158, P. R. China 
sy j ian@hainnu . edu . cn 

2 Department of Mathematics, Zhongshan University 
Guangzhou, 510275, P. R. China 
stslwOzsu . edu . cn 



Abstract. In this paper, we apply Shannon wavelet and Galerkin 
method to deal with the numerical solution of the natural boundary 
integral equation of plane elasticity probem in the upper half-plane. 
The fast algorithm is given and only 3 K entries need to be computed 
for one 4 K x 4 K stiffness matrix. 

Keyword: plane elasticity problem, natural integral equation, 

Shannon wavelet, Galerkin- wavelet method. 



1 Introduction 

The plane elasticity problem arises from the plane strain problem and the plane 
stress problem which are widely applied in engineering. For the plane elasticity 
problem in a disc we have obtained the fast algorithm for the numerical solution 
by the wavelet method [1]. Now we consider the problem in the upper half-plane 
which has been considersd by Yu in [2] , but he did not give the algorithm of the 
numerical solution. In this paper, as in [1], to reduce the problem into the integral 
equation we use the natural boundary element method which first introduced by 
Kang Feng and Dehao Yu [3]. In the last decade, the natural boundary element 
method has been efficiently used to solve some elliptic problems [1,2,4] . One of 
the advantages of the natural boundary integral element method is that the en- 
ergy functional of the original partial differential equation preserves unchanged 
which results in the unique existense and stability of the solution of the natural 
boundary integral equation. The natural boundary integral equation possesses 
the kernel with hypersingularity in Hadamard finite part sense. Nowadays many 
methods have been developed to deal with the hypersingular integrals [1,2,5]. 
In this paper, we utilize Galerkin- wavelet method and the Fourier Transform of 
the singular kernel in the distribution sense to tackle the difficulty of hypersin- 
gularity. It is a potential numerical technique for using wavelet to solve partial 

* Supported in part by NSF of Hainan normal university 

** Supported in part by NSF of Guangdong 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 139 144, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



140 



Youjian Shen and Wei Lin 



differential equations and integral equations in recent years ([1], [4], [6]- [8]). The 
wavelet we use is Shannon wavelet that is a important wavelet in signal pro- 
cess which has excellent localization property in frequency [9] . We find that our 
Galerkin- wavelet method is very suitable to solve the natural boundary equation 
of the plane elasticity problem in the upper half-plane. As a result, the computa- 
tional formulae of the stiffness matrices are simple and only 3 K entries need to 
be computed for a 4 K x 4iLstiffness matrix. So that our fast algorithm requires 
less computational cost and the solution error is small in practical computation. 

We organize this paper as follows. In Section 2, we introduce the Poisson 
integral formula and the natural integral equation of the plane elasticity in the 
upper half-plane. In Section 3, we use the Galerkin- wavelet method to solve the 
natural integral equation and give the computational formulae of the stiffness 
matrices, and in Section 4, we consider the convergence of the numerical solution. 
Lastly, the results of numerical experiments are presented in Section 5. 



2 Plane Elasticity Problem 

We consider the second boundary problem of the plane elasticity equation in the 
upper half-plane 



( Lu = 0 in Q := {(x,y)\y > 0} 
(3u = g on R. 



where u = (u^ui) 



( 2 . 1 ) 



Lu = 



(3u = 



1 dx 2 



+ *£? (a-b) 



dxdy 

$ 



S a ^ dxdy b dx 2 + a dy 2 



-b— 

°dy 






(2 b °) dx a dy 



y = o 



with a = A + 2/i, b = fi (A,/i are Lame constants), and g = (^ 1 ,^ 2 ) is a given 
vector function on R and satisfies the following compatible conditions: 



/ 



gi(x)dx = 0, 



i = 1,2. 



( 2 . 2 ) 



Set 



W^f2) = {u\ 



y/l + a; 2 + y 2 ln(2 + x 2 + y 2 ) ’ dx ’ dy 






From Green formula, it is not difficult to show that for any u, v G Wq^) 2 
J J (v • Lu — u • Lv)dxdy = J (v • f3u — u • f3v)dx (2.3) 



Wavelet Algorithm for the Numerical Solution of Plane Elasticity Problem 141 



From this and the Green function of the equation (2.1) ([2], chapter IV) we can 
get the Poission formula 



u = P * u 0 , y > 0 



where uq = u\ y= o , 



P = 



7r (x 2 +y 2 ) 7r(a+b)(x 2j ry 2 ) 2 

2{a<g>b)xy 2 
7r(a+6) (x 2 -\-y 2 ) 2 



( a<g>b)y(x 2 ®y 2 ) 



2 (a<g>b)xy 2 
7r(a+b)(x 2 -\-y 2 ) 2 

y (a®b)y(x 2 ®y 2 ) 

ir(x 2 -\-y 2 ) 7r(a+6) (x 2 -\-y 2 ) 2 



(2.4) 



Substituting (2.4) into the boundary value condition (3u = g, we obtain the 
following natural boundary integral equation of the problem (2.1) 



JCuq = g 



(2.5) 



where 



JCuq = 






25 2ab 
a+o v ' 7r (a+b)x 2 



2ab 

T(a+b)oL 



* no- 



where 5(x) is the Dirac function. It is obvious that the kernel of the natural inte- 
gral operator JC possesses 2nd-order singularity. On the other hand, if boundary 
load g G P 01 / 2 (P) 2 satisfies the compatible condition (2.2) , then the natural 
boundary integral equation (2.5) has a unique solution in H l / 2 (R) 2 [2]. 

Introduce the bilinear form 



D(u 0 ,v 0 ) = J v 0 -JCu 0 dx, 



and the linear functional 



Hv o)= J 



= / g • V 0 dx 



then the natural boundary integral eqution (2.5) is equivalent to the following 
variational problem: 



f find Uo e H 1 / 2 ^) 2 , s.t. 

{ D{u 0 , Vo) = F(v o), Wv 0 e F 1 /2( J R)2 



3 Galerkin- Wavelet Methods 



Set 










ko 


X[07T,7r] (0 


(3.1) 


then 










**>-£/ 


7//-\ i£x 7 sin 7 TX 

7VX 


(3.2) 



0 



142 



Youjian Shen and Wei Lin 



This is well known as the scaling function of Shannon wavelet. Now, we use 
Galerkin method to solve the variational problem (2.6). For K £ NJ £ Z let 



V 3 k = Span{(t) jt k(x)\<pj,k{x) = 2i/ 2 (f>(‘2ix - k), k = -K, -K + 1 , • • • ,K - 1 }, 

Substituting (V J K ) 2 for H l /‘ 2 (R) 2 in (2.6) leads to the following approximate 
variational problem: 

find u J 0 ’ K £ C V 0 K ) 2 s.t. 

D(ub K ,vi’ K ) = F(v j d K )M K € (Vf ) 2 

We express u , m;' )2 a as 



j * 

{ Ufe* = 



KOI 

E 

k =<S>K 

KOI 

E 

k—<S>K 



a j,k<t>jA x ) 

a lk^A x ) 



select v 3 0 ’ K = (<p jtm (x),0) and v{j K = (0 ,<f>j tm (x))(m = —K, —K + 1, • • ■ K — 1) 
respectively, then we get the following linear algebraic system: 



( Q n Qi2\ ( al \ = ( 

VQ21 Q22 ) \ a 2 J \h ) 



(3.4) 



where 



% ( % % % \ T • 1 

a = l^OKi a j,<g)K+ir ' ' 5 a j,K®l) a 1 — T 

Qps = (A.n)m,n=OK 1 QK+r < CKQb Pi s = 

C = D((ti 1 ,s,5 2 , s )(f>j,n(,x), (5 ltP ,5 2 ,p)^j,m(x)) 
fi = b® K+1 , •• • , &^-(g,i) T , i = l,2 



6L 



s(z) • 



and Sij is the Kronecker’s symbol. 

Theorem 1. The entries of the stiffness matrix of the linear algebraic system 
can be expressed as 



Q 



11 

mn 



= <7 22 

Iran 



2 J abir 



a+6 5 
2 J+1 ab 
7 r(a+6)r 2 



((-i) r 



i), 



r = 0 
r ^0 



(7 21 = (7 12 

dmn ^ mn 



where r = m — n. 



0, 

2 J + 1 b 2 / i\r 
(a+6)r V / ’ 



r = 0 
r 7^0 



(3.5) 

(3.6) 



By the Theorem (3.1) only 3 K entries need to be computed for one 4 K x 4 K 
stiffiness matrix. 




Wavelet Algorithm for the Numerical Solution of Plane Elasticity Problem 143 



4 Convergence of Numerical Solution 

For j £Z,K EN, we define Lf : L 2 (R) -> Vf as 

K® 1 

Lff= Y 

k=<g>K 



Lemma 1. For all f G H 1 (R), we have 

lim lim \\ L f f ~ f\\m(R) =° (4- 1 ) 

3l Kl 

Theorem 2. If uo G then 

lim lim ||u 0 - u } 0 ’ K \\ D = 0 (4.2) 

31 K-l 

where || • || D is energy norm induced by bilinear form -),i.e. || • \\ D = -) 1 / 2 



5 Numerical Results 



In this section, we present the numerical results of a test example to illustrate our 
algorithm for the natural boundary integral equation (2.5) discussed in section 3. 
Example Consider the problem 



JCuq 



3x 3 — x 3x 2 — 1 \ 

10(x 2 + l) 3 ’ 10(x 2 + l) 3 ) 



on R. 



Selecting Lame constants A = 1,/r = 0.5. then the exact solution is 



/ x(4x 2 + 1) x 2 — 5 A 

v 30(x 2 + l) 2 ’ 60(x 2 + l) 2 ) ' 



Table 1. Numerical results (K = 2 J ) 



j 


1 


2 


3 


\Wa*( x ) ~ u 0 (x)\\ L 2 (r) 


0.14351 


0.14349 


0.14345 


max \u J 0 ,K (m) — uo(m)\ 

— 5<m<5 


1.28227e -17 


2.666464e“ 17 


2.63968 -17 



j 


4 


5 


6 


\\ u a* ( x ) ~ u o{x)\\ L 2 (r) 


0.14337 


0.14318 


0.14265 


max \uh K (m) — uo(m)\ 

— 5<m<5 


4.46704e -17 


4.65589e -17 


5.11821e -17 



The computational results of above examples show that our algorithm priv- 
ides high accuracy with low computing cost. 






144 



Youjian Shen and Wei Lin 



References 

1. W. Lin, Y. J. Shen, Wavelet solutions to the natural integral equations of the plane 
elasticity problem, Proceedings of the second ISAAC Congress, Vol. 2, 1471-1480. 
(2000), Kluwer Academic Publishers. 139, 140 

2. Dehao Yu, Mathematical theory of natural boundary element methods, Science press 
(in Chinese), Beijing (1993). 139, 141 

3. K. Feng and D. Yu, Canonical integral equations of elliptic boundary value problems 
and their numerical solutions, Proc. of China-France Symp. on FEM, Science Press, 
Beijing (1983), 211-252. 139 

4. Wensheng Chen and Wei Lin, Hadamard singular integral equations and its Hermite 
wavelet, Proc. of the fifth international colloquium on finite or infinite dimensional 
complex analysis , (Z. Li, S. Wu and L. Yang. Eds.) Beijing, China (1997), 13-22. 
139, 140 

5. C.-Y. Hui, D. Shia, Evaluations of hypersingular integrals using Gaussian quadra- 
ture, Int. J. for Numer. Meth. in Engng. 44, 205-214 (1999). 139 

6. R. P. Gilbert and Wei Lin, Wavelet solutions for time harmonic acoutic waves in a 
finite ocean, Journal of Computional Acoustic Vol. 1, No. 1 (1993) 31-60. 140 

7. C. A. Micchelli, Y. Xu and Y. Zhao, Wavelet Galerkin methods for second-kind 
integral equations. J. Comp. Appl. Math. 86 (1997), 251-270. 

8. Tobias Von Petersdorff, Christoph Schwab, Wavelet approximations for first kind 
boundary integral equations on polygons, Numer, Math, 74 (1996), 479-519. 140 

9. I. Daubechies, Ten lectures on wavelets, Capital City Press, Montpelier, Vermont, 
1992. 140 



Three Novel Models of Threshold Estimator for 
Wavelet Coefficients 



Song Guoxiang and Zhao Ruizhen 

School of Science, Xidian University 
Xi’an, 710071, R R. China 



Abstract. The soft-thresholding and the hard-thresholding method to 
estimate wavelet coefficients in wavelet threshold denoising are firstly 
discussed. To avoid the discontinuity in the hard-thresholding and bi- 
ased estimation in the soft-thresholding, three novel models of threshold 
estimator are presented, which are polynomial interpolating threshold- 
ing method, compromising method of hard- and soft-thresholding and 
modulus square thresholding method respectively. They all overcome 
the disadvantages of the hard- and soft-thresholding method. Finally, an 
example is given and the experimental results show that the improved 
techniques presented in this paper are efficient. 



1 Introduction 

Wavelet theory has recently become a popular mathematical tool in many re- 
search fileds. It throws a new light on such applications as image and signal 
processing. In this paper, we concentrate on the problem of signal denoising. 
Generally, there are three approaches which are used to distinguish noise from 
regular wavelet coefficients. The first one is based on the principle of modulus 
maximum with wavelet transform presented by Mallat[l][2]. The second ap- 
proach is grounded on the different correlation properties between the wavelet 
coefficients of the noise and the regular signal. And the third approach is the 
wavelet thresholding technique presented by Donoho[3] [4]. For the third case, the 
idea of the hard-thresholding or the soft-thresholding method is to replace the 
small coefficients by zero and keep or shrink the large coefficients. However, the 
Estimated Wavelet Coefficients (EWC) obtained in the hard-thresholding are 
not continuous at the threshold, so it may induce the oscillation of the recon- 
structed signal. In the soft-thresholding case, EWC are mathematically tractable 
due to the good continuity, but when the wavelet coefficients become larger, there 
are deviations in EWC. Thus the error will certainly bring to the reconstructed 
signal. The methods discussed in this paper belong to the third case, namely, 
noise reduction based on thresholding. Combining with the hard-thresholding 
and the soft-thresholding, three improved techniques are presented in this pa- 
per to avoid the disadvantages. They are polynomial interpolating thresholding 
method, compromising method of the hard- and soft-thresholding and modulus 
squared thresholding method respectively. The wavelet coefficients estimated 

Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 145 150, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



146 Song Guoxiang and Zhao Ruizhen 



through two of the methods given in this paper are continuous at the threshold 
and nearly unbiased when the original coefficients become larger. And all the 
three methods obtain good results. The correspondence is organized as follows. 
Section 2 briefly introduces some basic notations of wavelet transform and the 
hard-thresholding and the soft-thresholding method. And three novel models of 
threshold estimator for wavelet coefficients are presented in Section 3. Finally, 
Section 4 gives some experimental results, and a brief conclusion is stated in 
Section 5. 

2 Wavelet Transform and the Thresholding Method 

Suppose there is an observed signal 

fit) = s (t) + n(t) (1) 

where s(t) is the original signal, n(t) is Gaussian white noise with mean 0 and 
variance a 2 . If f{t) is sampled as N point discrete signal /(fc), then the wavelet 
fast algorithm is 

Sf(j + l,k) = Sf(j,k)*h(j,k) (2) 

Wf(j + l,k) = Sf(j,k)*g(j,k) (3) 

where 5/(0, k) is original signal /(fc), 5/(j, k) are approximated coefficients and 
Wf(j,k) wavelet coefficients, h and g are respectively low-pass and high-pass 
filters. For convenience, we abbreviate Wf(j,k) to Accordingly, wavelet 

reconstruction formula is 

Sf{j -l ,k) = Sf(j , k) * h(j, k ) + Wf(j, k) * g(j, k ) (4) 

Due to the linear property of wavelet transform, the wavelet coefficients Wj ^ 
of observed data f{k) = s(k) + n(k) consist of two parts. One is Ws(j,k ) (ab- 
breviated to Uj t k) corresponding to s(k) and the other is Wn(j,k ) (abbreviated 
to Vj : k) corresponding to n(k). 

The idea of the wavelet threshold denoising is 

1. Getting the wavelet coefficients Wj ^ from the noisy signal f(k) by using (2) 
and (3); 

2. Determining the estimated wavelet coefficients Wj^ from Wj ^ by the thresh- 
olding method such that \\wj,h ~ u j,k\\ are as sma ll as possible; 

3. Reconstructing the denoised signal f(k) from Wj ^ by (4). 

Donoho has presented a very concise method to estimate the wavelet coeffi- 
cients Wj : k • A proper threshold A should be firstly chosen. Then the coefficients 
with absolute values smaller than A are replaced by zero and those larger than A 
are kept in the hard-thresholding case and shrunk in the soft-thresholding case. 
The threshold of Donoho and Johnstone [4] is A = cr^/2 log TV. Define 



kVj,k — 



^j,ki \Wj,k\ — ^ 

0, \wj,k\ < A 



(5) 




Three Novel Models of Threshold Estimator for Wavelet Coefficients 147 

It is called the hard-thresholding estimator. The soft-thresholding estimator is 
defined as 

_ / sign{wj >k )(\wj >k \ - A), \w jik \ > A , . 

j ’ k ~\0, K*|<A W 

Although these methods are widely used in applications, they have some 
underlying disadvantages. For instance, the estimated wavelet coefficients Wj^ 
by the hard-thresholding method are not continuous at the threshold A, which 
may lead to the oscillation of the reconstructed signal. In the soft-thresholding 
case, when Im^l > A, there are deviations between wj^ and which directly 
influence the accuracy of the reconstructed signal. To overcome the above disad- 
vantages of the hard-thresholding and the soft thresholding method, we present 
some improved schemes in Section 3. 

3 Three Novel Models of Threshold Estimators 

3.1 The Polynomial Interpolating Thresholding Method 

Because the hard-threshold estimation is not continuous and the soft-threshold 
estimation has some deviations, the applications of these methods are somewhat 
limited. So we have a chance to improve them. A natural approach is to design an 
estimator from which the estimated wavelet coefficients Wj ^ will be continuous 
at the threshold A and with the Wj ^ increase, little deviation exists in EWC. 
For example, we can design an estimator such that for Im^l > t, (t > A), Wj ^ 
and Wj : k are completely the same. Such an assumption can be realized through 
polynomial interpolating. The model is as follows: 



{ w j,k, \Wj,k\<t 

sign{wj t k)P(\wj t k\), A < \w j)k \ t (7) 

0, \w jtk \ < A 

where P(\wj tk \) is an interpolating polynomial. Generally P(\w :l j i: \) can be quad- 
ratic or cubic polynomial. The corresponding interpolating conditions are 



P(A) = 0 
P(t ) = t and 

p(t) = i 



P(A) = 0 

P (A) = 0 

P{t) = t 
P (t) = 1 



(8) 



respectively. Very simple derivations can lead to the quadratic polynomial 



P{x) = - 1 [Ax 2 - (A 2 + t 2 )x + At 2 ], (A <x<t) (9) 

[t A) 



and the cubic polynomial 



P{x) — —— [{t + A)x 3 — 2 (£ 2 + tX + A 2 )x 2 + 

(t — X) 

A(4 1 2 +t\ + \ 2 )x — 2t 2 A 2 ] , (A < x < t) 



(10) 




148 Song Guoxiang and Zhao Ruizhen 



The estimated wavelet coefficients Wj :k obtained from the above method are 
continuous everywhere. Moreover, if P(x) is a cubic polynomial, then wj^ are 
derivative in the whole domain as well. For \wj jk \ > t, Wj jk are unbiased esti- 
mated, which makes up for the shortage of the soft-thresholding. 



3.2 The Compromising Method of the Hard- and Soft-Thresholding 



Define 

_ J sign(w jtk )(\w jtk \ - aX), \w jtk \ > A , , 

Wj ’ k ~\0 , Kfel < A ’ 1 - - 1 ' 



( 11 ) 



This model of estimator for wavelet coefficients is called the compromising 
method of the hard- and soft-thresholding. Particularly, (10) will turn to the 
hard-thresholding (5) if a equals 0 and the soft-thresholding (6) if a is 1. For 
0 < a < 1, it is clear that the data Wj jk by (10) lie between those by (5) and 
(6). So it is called the compromising method of the hard- and soft-thresholding. 

This method is quite efficient in noise reduction although it is simple and 
straight forward. It is no wonder if we pay a little attention to the thresholding 
method itself. For the soft-threshlding case, the absolute value of the estimated 
coefficient Wj jk is always smaller than that of Wj jk by A (when Wj jk < A). There- 
fore, the deviation should be cut as small as possible. However, the deviation 
being zero (corresponding to the hard-thresholding) is not the best case as well 
in that \wj jk \ is always larger than \uj jk \ in most cases because Wj :k consists 
of Uj^ and Vj jk . While our aim is to find proper Wj ^ such that \\wj jk ~ u j,k\\ are 
minimum. Therefore, the value of Wj jk should lie between \wj^ \ — A and 
which will make Wj^ k be closer to Uj jk . Based on this idea, we add a factor a 
in the soft-thresholding estimator (6) to improve the performance, a is any real 
number between 0 and 1. An appropriate a may better the denoising result. In 
this correspondence, we choose a = 0.5. 



3.3 The Modulus Squared Thresholding Method 

We firstly consider the case Wj ^ > 0, then generalize the result to Wj^ < 0. In 
the soft-thresholding method, (6) is equivalent to 






( X(w j}k /X - 1), Wj tk / A > 1 

\0, W j,k/X < 1 



( 12 ) 



when ujjj; > 0. If we see w :h k/X as a whole, then (12) means that when w :h k/ X > 
1, Wj : k can be thought as the coefficients of the signal and hence are kept, 
otherwise Wj ^ should be removed since they are considered as the coefficients of 
the noise. Although it is equivalent to (6), (12) is easier to be extended. We can 
modify (12) as the following model 



_ / A\/ {wj^ k /X) 2 T, w jik /X > 1 

W j,k/X < 1 



( 13 ) 




Three Novel Models of Threshold Estimator for Wavelet Coefficients 



149 



The difference between (13) and (12) is that in (13) Wj^/ A is in its square 
form. The advantage of this modification is that if Wj^/ A is above 1, then the 
square of Wj ^ / A will become larger; if Wj ^ /A is below 1, then the square of Wj ^ / A 
will become smaller. Such a procession will speed the separation of noise from 
signal. 

(13) is true only if Wj ^ > 0. For general case we have 

, = f sign{w^ k )^/{w~kf -A 2 , \w jtk \ > A ^ 

J ’ \0, \wj,k\ < A 

It is easy to prove that when \wj^\ > A, 

\wj,k \ - A < yj (Wj,*) 2 - A 2 < \w jt k\ (15) 

holds. From (15) we can know that the value of Wj^ estimated by (14) still lies 
between those by (5) and (6). When \wj^\ > A, Wj ^ is a nonlinear function. And 
vbj : k becomes closer and closer to Wj ^ with \wj^\ increasing. 

4 Experimental Results 

A comparison is made in signal denoising with the above threshold methods pre- 
sented in this paper. Instead of the fixed threshold A = ay21og(]V) presented by 
Donoho, we take the different thresholds A j = cr^/2 log(TV)/ log(j + l) at different 
scales. A noisy signal is processed by the above five methods. Before denoising, 
the Signal to Noise Ratio (SNR) is 8.226270. Table 1 shows the comparison of 
the SNR and relatively mean square error (RMSE) of the reconstructed sig- 
nal with the above methods. From Table 1, we can see that the compromising 
method of the hard- and soft-thresholding and the modulus square thresholding 
method are obviously superior to the hard-thresholding and the soft-thresholding 
method. The polynomial interpolating thresholding method is only superior to 
the hard-thresholding method and is equivalent to the soft-thresholding method. 

Table 1. Comparison of estimators for wavelet coefficients by SNR and RMSE 



Estimator 


SNR 


RMSE 


Soft-thresholding 


15.276322 


0.172260 


Hard-thresholding 


14.331342 


0.192058 


Square Interpolating 


15.152992 


0.174723 


Cubic Interpolating 


15.288729 


0.172014 


Compromising of Hard and Soft 


15.582417 


0.166295 


Modulus Squared Method 


15.367344 


0.170464 



5 Conclusion 

We will indicate that A j in this paper are not the best. If A j are properly selected, 
the superiority of our methods will be more remarkable. In addition, for differ- 
ent A j, the experimental results may be slightly different. However, from a mass 




150 Song Guoxiang and Zhao Ruizhen 



of experiments the authors have made, a conclusion can be drawn that the pure 
hard- or soft-thresholding method has poor stability and is strongly dependent 
on A j. Moreover, at least one of the two methods can not reach a satisfactory 
result. By comparison, the modulus square thresholding method and the poly- 
nomial interpolating thresholding method are more stable and can obtain nearly 
the same results as the better one of the hard- or soft-thresholding. Finally, 
whatever A j is, the compromising method of the hard- and soft-thresholding is 
obviously superior to the hard- or soft-thresholding method. 

In addition, we only make some improvements on the threshoding method 
itself. There are some other problems in wavelet threshold denoising such as the 
selection of the threshold A and nonstationary noise, say, Poisson noise case and 
so on. Some achievements have been made in [5], [6], [7]. 



References 

1. Mallat S. and Zhong S. Characterization of signals from multiscale edges. IEEE 
Trans, on PAMI, 1992, 14(7): 710-732 

2. Mallat S. and Hwang W. L. Singularity detection and processing with wavelets. 
IEEE Trans, on IT, 1992, 38(2): 617-643 

3. Donoho D. L. De-noising by soft-thresholding. IEEE Trans, on IT., 1995, 41(3):613- 
627 

4. Donoho D. L. and Johnstone I. M. Ideal spatial adaption via wavelet shrinkage. 
Biometrika, 1994, 81:425-455 

5. Jansen M. and Bultheel A. Multiple wavelet threshold estimation by general- 
ized cross validation for Images with correlated noise. IEEE Trans, on IP., 1999, 
8(7):947-953 

6. Nowak R. D. and Baraniuk R. G. Wavelet-domain filtering for photon imaging 
systems. IEEE Trans, on IP., 1999, 8(5):666-678 

7. Ching P. C., So H. C. and Wu S. Q. On wavelet denoising and its applications to 
time delay estimation. IEEE Trans, on SP., 1999, 47(10) :2879-288 




The PSD of the Wavelet-Packet Modulation 



Mingqi Li 1 , Qicong Peng 2 , and Shouming Zhong 1 

1 Applied Mathematics Department of the University of Electronic Science and 

Technology of China 
Chengdu, 610054, P. R. China 
lmqi2001@yahoo . com . cn 

2 Institute of Communication Sz Information Engineering, University of Electronic 
Science and Technology of China, 

Chengdu, 610054, P. R. China 



Abstract. On the wavelet-packet modulation scheme, wavelet packets 
are used as carriers, where information to be transmitted is encoded, via 
an inverse wavelet packet transform, as the coefficient of wavelet packets. 
The power spectrum density (PSD) of the modulated signals describes 
the property of modulation in frequency domain, which is discussed by 
many researchers by simulation. In this paper, the formula of the PSD of 
the modulated signals is derived. The characteristics of the modulated 
signals, such as spread spectrum and spectral flatness, is shown from the 
formula. 



1 Introduction 

Wavelet and wavelet-packet transform with its desirable characteristics, such as 
location in time and frequency, and orthogonality across scale and translation, 
has brought out many useful properties. Wavelet-packet modulation is one of 
its important application, where wavelet packets are used as the waveform for 
information transmission. This new kind of modulation scheme generalizes the 
traditional baseband modulation scheme. In fact, we find now, both rectangular 
and sine pulse are scaling function corresponding Haar and Meyer wavelet re- 
spectively. On the other hand, wavelet packets modulation is seen as a new kind 
of multiplexing for its time and frequency overlapping, where TDM and FDM 
are taken as the special cases according to [1] . 

In the wavelet-packet modulation scheme, each user is assigned a set of wave- 
form in a group of wavelet packets. The information of each user is impressed on 
the corresponding waveform via the coefficients. At the receiver, the desired sig- 
nal is recovered by cross-correlation with a known reference signal in the wavelet 
packet basis. 

Spread spectrum and spectral flatness are very important in a communication 
system especially for channel fading and security. The simulation of wavelet- 
packet modulation, discussed by many references, gives many advantages in 
communication, including covert and featureless waveforms (see [2], [3], [4], [5]). 
We find to catch the PSD of the modulated signals is very important in its ap- 
plication. In this paper, the formula of the PSD of the wavelet-packet modulated 

Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 151 156, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



152 Mingqi Li et al. 



signals is arrived (theorem 1). As a special case of wavelet-packet modulation, 
the PSD of wavelet modulation (theorem 2) is achieved, too. 

Now, we briefly summarize the concepts of wavelet and wavelet packets. A 
multiresolution analysis consists of a collection of embedded subspace sequence 
in the space of finite energy signal L 2 (R). That is 

... c y 02 c c Vo c vi c v 2 ... 

Each subspace Vj has an orthogonormal basis {<Pj,k : <Pj,k = 2^(2 H—k), k £ Z}. 

(p{t) is called the scaling function satisfying <p(t) = \/2 ^ h[k\(p(2t — k). If 

k=<s> 

functions {(fj,k : k £ Z} forms an orthonormal basis of the space Vj, the following 
orthonormal constraint on h[n\ must be satisfied: 



Y2 h[k — 2 n\h[k — 2m] = £ m?n 

k=<g> 

E h[k] = V 2 

k=<S> 



(i) 



Then we get a quadrature mirror filter (QMF) h[n\, g[n\ := (— l) n h[l — n\. 
The function ^(t), defined by ^(t) = \[2 d[k]ip(2t — fc), is called a wavelet 

k= (g> 



function induced by (p(t). 

We define the recursive function sequence {P n (t)} as Coifman, 



P2n(t) = V2 E h[k]p n (2t - k ) 

k=<S> 

P2n+l{t ) = \/2 E ffft]Pn(2t ~ k ) 

k=<S> 

where p 0 (t) = ip(t),pi(t ) = 

We need also the following notation: 



( 2 ) 



4>i,m(t) := 2ip m (2 l t) (3) 

UP := Clos{2^p m (2 N ® l t - k) : k e Z,m e Z+j. (4) 

Then we get Up = U?™ ® V N = U° N and W N = Ujj. In order to 

decompose the subspace Vn = f/jy-, the set of subspace U™ may be organized 
as a binary tree, where is on the top and is on the (m + l)th node 

of level l. There are 2 l nodes in the same level. We can grow or prune the tree 
in any desired fashion, and the different fashion provides a different set of basis 
functions. 



2 The Modulation and Its PSD 

Firstly, we consider a TDM system in which there are K^ m independent binary 
message signals interlaced with each other. Between two consecutive binary sym- 
bols of the same message, there are K^ m — 1 other binary symbol: one from each 




The PSD of the Wavelet-Packet Modulation 153 

of the other message signals. The combined sequence forms a composite sequence 
of binary symbols oy m [n], where cq ?m [n] = ±1. The system we propose here seeks 
the representation of the binary symbols 1 and —1 by 0z, m (t) and — re- 

spectively. Then the modulated signal of the TDM sequence {cr^ m [n]}, encoded 
by c — 2 Z0A/ n), can be given by 

= E a l,m[k\4>l,m{t ~2 l ® N k). (5) 

k=<g> 

Let the set M of (Z, m) satisfy UJ 71 = V/v- Since all the constituent 

terminal functions in a given tree structure M are orthogonal to each other, 
we may employ all of these functions to carry binary data from deferent TDM 
groups of users. So the total number of users is Ki,m- Let af rn [n\ represent 

the information sequence of the fcth user while its assigned waveform is — 

2 l ® N (nKi ?m + k)) in U^ t . We get the modulated signal s(t) satisfying 

K h m 

S(t) E E E <*:2 1 ® N (nKi^ + k)) (6) 

k= 1 n=0 

We make the following reasonable assumptions to simplify the calculation of 
PSD: 

1. Information sequence {crf m [n]} of user (Z,ra) is stationary process; 

2. Different user {&i m [ri\} with different (Z,ra) are statistically independent and 

= o. 

We denote the correlation coefficient of {crf m [n] : n e Z} as 

: = + ^D' ( 7 ) 



Then we have 

K l,m 

E(s(t + r)s T 0)) = E(( E E E cr LNd,m(i-2'® iV (^,m+ 

fc=l n=<S> 

K a , b 

k )))( E E E cr a,bN^,m(i + T - 2°® Ar (^ a , 6 + fc )))) 

M c — 1 C? = 0 

so, we get 



Kl,m 

E{s{t + r)s\t))== E E E E 

(Z,m)— M fc=l n=0 /i=0 



k))<t>i,m(t + t- 2*® JV ((n + + fc)) 




154 Mingqi Li et al. 



Define function g(t)\ 



m -.= E E * E E Rt m [h}4>l m (t-2 l ^(nK l , m + 

k= 1 n=® h=® \°) 

k ))< t > i, m (t + t - 2 l ® N ((n + h)Ki, m + k )). 

When Ki :Tn is a constant K indepent /,m, each of waveforms has the same 
number of users. Then g(t) is a periodic function. That is g(t) = g(t + K). 
We know from the assumptions above that stochastic process s(t) is generalized 
cyclostationary with period T = K^ m = K. So the correlation function of s(t) 
can be defined 



R(t) := 



1 

K 



r K 



g(t)dt 



(9) 



Then the R(r) of s(t) is 



= i E E E E / 

/ 7 7 \J 1 7 O nr, <8 



K®2 l0iV (nK+fe) 



k=l h=(& n=0 ^®2*®^(nK+fc) 

r - 2 l ® N hK)du 

K 






I E EE + 

/ 7 \ ,71/T 7„ 1 7-. O J < 8 > 



(Z,ra)— M fc=l /i=® 

So we get the PSD of s(t). That is 

K 



R{") = ^ E EE 



2 „® juh2 l ® N K 



(Z,?n) — sA/f /c — 1 /i — $ 
K 



l E Ei^hi 2 E 



K 



fc=l 



h—<^ 



So i2(o;) is arrived, 



£ y ^ 2 ^ i ^ m Mi 2 4 fc m ( 2 ^) 



( 10 ) 



fc=l 



where Rf m (uj) := £ 

/l = <8> 

Summarizing the description above, we get the following theorem: 

Theorem 1. If assumption (1) and (2) are satisfied and K^ m is a constant K 
independent on l,m, the PSD of s(t) is 



K 



r(u) = k E E 2 jV ®'i^Hi 2 AE( 2 ^) 



(ii) 



(Z,m)-Mfc=l 



where R^ m (ui) = £ • 

h=<S> 




The PSD of the Wavelet-Packet Modulation 



155 



When we select the waveforms of nodes (/, 1), the wavelet-packet modulation 
turns into wavelet modulation. We can get easily a similar result. 

Now, we discuss a wavelet modulation with mo users and the waveforms 

= 2^ip{2 l t — n ), mi <l< mo + mi, mi G Z. 

We get the modulated signal s(t). That is 

rao+mi 

s{t) = E An(t) (12) 

l— mi n=<g> 

where cri[n\ = ±1 is the nth data of Zth user. We denote 



R^k) :=£7(a,(n + fc)sK»)),£i(w) := £ Ri[k\e®>“ K 

R(t + r, t) := E(s(t + r)s T (t)) 


(13) 


Theorem 2. If { <Ji[n ] : n G Z} is a stationary process and {cq[n] : n 
statistically independent with E{ai[n}) = 0, the PSD of s(t) is 


G Z} are 


rao+mi 

i?H= Y l^(2 0 V)| 2 it(2 0 ^) 


(14) 


l=m i 




Proof. The correlation function of s(t) is 




R{t) = f R(t + r,t)dt 

J 0 


(15) 


Then we have 




rao+mi r. 

R(t) = Ri[k] 2 l / ^ {u)fj{u P 2 1 t — k)du. 

1 = 771 1 k = <S> ® 


(16) 



So, the PSD of s(t ) is 

rao+mi rao+mi 

R(u>) = E E Ri[k]\^(2® l Lj)\ 2 e^ 2<SlK = |^(2® i w)| 2 i?;(2®^). 

l=m i k=<S> l=m i 

From the theorems above, we know clearly that wavelet-packet (wavelet) modu- 
lation spreads the spectrum of the original signals. We will get wider spectrum of 
the modulated signals with greater N in theorem 1 and mi in theorem 2. Because 
wavelet (especially wavelet packets) has wonderful time-frequency localization 
property, We can select TV, l and mi so that the spectrum of the modulated 
signals is in the domain we need. That’s very important for a communication 
system in the freqency-selective channel. We find also, from the formula, that 
the PSD of the modulated signals will vary slowly with frequency /. Further 
more, the PSD of the modulated signals will be flatter with lager N and mi. 
The featureless waveform would be helpful for the covert in a communication 
system. 




156 Mingqi Li et al. 



3 Conclusion 

The PSD formula of the wavelet-packet and wavelet modulated signals is de- 
rived in the paper. It gives the properties of the modulated signals in frequency 
domain. It will be helpful in application, especially in the design of a communi- 
cation system based on wavelet-packet. 

References 

1. K. M. Wong, Jiangfeng Wu, et. al.: Performance of wavelet packet- division mul- 
tiplexing in impulsive and gaussian noise, IEEE transactions on comm., Vol. 48, 
No. 7, pp. 1083-1086, July. 2000. 151 

2. A. R. Lindsey, J. C. Dill Proc.: A digital transceiver for wavelet-packet modulation, 
SPIE, Vol. 3391/255-264. 151 

3. R. S. Orr, C. Pike, M. J. Lyall: Wavelet transform domain communication systems, 
Proc. SPIE, Vol. 2491/271-282. 151 

4. Prashant P. Gandhi, Sathyanarayan S. Rao, et.al: Wavelets for Waveform Coding 
of Digital System, IEEE transactions on signal processing, Vol. 45, No. 9, pp.2387- 
2390, Sep. 1997. 151 

5. R. E. Learned, et al: Wavelet-packet-based multiple access communication, Proc. 
SPIE, Vol. 2303/246-264. 151 



Orthogonal Multiwavelets with Dilation Factor a 



Shouzhi Yang, Zhengxing Cheng, and Hongyong Wang 

Department of Mathematics, Xi’an Jiaotong University 
Xi’an, 710049, P.R.China 
yangshouzhi@china.com 

Abstract. There are perfect construction formulas for the orthonormal 
uniwavelet. However, it seems that there is not such a good formula 
with similar structure for multiwavelets. Especially, construction of mul- 
tiwavelets with dilation factor a(a > 2, a E Z) lacks effective methods. In 
this paper, a procedure for constructing compactly supported orthonor- 
mal multiscale functions is first given, and then based on the constructed 
multiscale functions, we propose a method of constructing multiwavelets, 
which is similar to that of uniwavelet. Finally, we give a specific example 
illustrating how to use our method to construct multiwavelets. 

1 Introduction 

Since Geronimo, Hardin and Massopust [ 1 ] presented the first example of multi- 
wavelets by using fractal interpolation functions, the study of multiwavelets has 
drawn many researcher’s attention(e.g. See [2], [3] and [4]). Later, more exam- 
ples were provided in [5] and [ 6 ]. As we know, Daubechies [7] obtained perfect 
constructing formulas for the uniwavelet. Since multiwavelets is a vector- vauled 
function, the construction of multiwavelets is more difficult than the that of uni- 
wavelet. Multiwavelets can possess simultaneously many desirable properties, 
such as continuty, compact and short support edness, orthonormality, interpolat- 
ing, and very important symmetry or antisymmetry. However, for the uniwavelet, 
some of these properties are impossible or incompatible. From this respect, appli- 
cations of multiwavelets are more extensive than those of uniwavelet. Therefore, 
finding the approaches of construction for the multiwavelets is very significant 
both in theory and in applications. Donovan, Geronimo, and Hardin [ 8 ] discussed 
the above problem by using fractal interpolation functions, but their construc- 
tion procedure is very complicated. The main objective of this paper is to give a 
way of constructing compactly supported multiscale functions and the associated 
multi wavelets. 

2 Multiresolution Analysis 

Let <!>(#) = (0i, 02, • • • , 0r)^\ 0i 5 02, • • • , 0r £ L 2 (R) , satisfy the following two- 



scale matrix equation: 



M 




(i) 



k = 0 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 157 163, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



158 Shouzhi Yang et al. 



where some r x r matrices {Pk} are called the two-scale matrix sequence. <&(#) 
is termed multiscale functions with dilation a (a > 2, a G Z) and multiplicity r. 
Applying Fourier transformation to (1), we obtain 

1 M 

£(«;)= P(*)#(^), P{z) = ~Y J PkZ k ,z = e®^. (2) 

k = 0 

P(z) is called the two-scale matrix symbol of matrix sequence {Pk} of 3>. 

Define subspace Vj = clos^^^.^*. : 1 < £ < r,k <E Z),j G Z, here and 

afterwards, for ft G P 2 , we will use the notation ft.j,k = a* fz{a?x — k). 

As usual, in (1) generates a multiresolution analysis {Vj},- z of P 2 (F), 
if {Vj}, z satisfy the nestedness, ••• C Vo C Vi C V2 • • • . Let Wj , j G 
Z, denote the orthogonal complementary subspace of Vj in Vj+1, and vector- 

valued function ^(x) = (ip 1: ip 2 , • • • , ^(a®i)r) T > i’e £ L 2 ,t = 1,2 , • • • , (a - 1 )r, 
constitutes a Riesz basis for Wj, i.e. , Wj = clos L 2 (R)(^£:j,k : 1 < £ < (a — 

1 )r, k G Z), j G Z. It is clear that ^(x), -02 (a?) , * * * > '0(a(g>i)r(^) are in Wo C Vi, 
Hence there exists a sequence of matrices {Qk}k z such that 

M 

\fr(x) = Y - k). (3) 

k=0 

From the two-scale relation (3), we obtain 

1 M 

$h = q(*) 4(^), ew = -E^ fc - ( 4 ) 

fc=0 

For column vector functions A and P with elements in P 2 (F), define (A, P) = 

rn np 

f R A(x)P(x) 1 dx. We call = (0i, 02, * * * , 4>r ) 1 orthogonal multiscaling 

function, if ($(•),$(• - n)) = <5o, n J r , n E Z. ’J'(x) = (V>i, if>2, • • • , 4>( a ®i)r) will 
be said to be orthogonal multiwavelets associated with multiscaling functions 
<F, if ^f(x) satisfy the following equations (<!>(•), ^3>( — n)) = (*(•)>*(* -«)) = 

Or — ^a®l)r Und 1 ® f (' u)) ^0,ri-^(a(8>l)r; kl G Z , where O r — >(a(g)l)r 

and /( a 0i) r denote the zero matrix and unit matrix, respectively. 

Lemma 1 Let 77 = (771, 772, • • • , rj r )^- , where 771, 772, • • • , r] r G P 2 , then 
{ijt (x — k) : 1 < £ < r, k <E Z} is a family of orthogonal functions if and only 

if ^2 r\{oo -\- 2kn)fj{ix + 2/c7r)^ = J r , \z\ = 1, here and throughout, the asterisk 
k z 

denotes complex conjugation of transpose. 

Lemma 2 let &(x) be a multiscale function satisfying (1), P(z) be two- 
scale matrix symbol, then (i) <&(#) is compactly supported, with supp &(x) C 

[0, ^y]; (h) P(l) has eigenvalue 1, and [P(l)] n converges as n — » 00; (iii) the 

vector u = 4>( 0) is an eigenvector corresponding to the eigenvalue 1 of P(l); 

Similar to the case of a = 2 (See [9]), (i) can be proved analogously, (ii) and 
(iii) also can be deduced by using the similar method in [2] 




Orthogonal Multiwavelets with Dilation Factor a 



159 



Lemma 3 Let be a multiscale function satisfying (1). If both P 0 , Pm are 
not nilpotent, then Supp <£ = [0, y]. 

The Lemma 3 can be proved by using the Similar method in [9]. 

3 Construction of Orthonormal Multiwavelets 

Theorem 1 Let <&(x) be the orthogonal multiscaling functions defined in 
(1), P(z ) be the two -scale matrix symbol, Uj, j = l,2,---,a be a roots of 

M 

equation z a — 1 = 0,then ^ P(ujjz)P((jjjzy = J r , |z| = 1. i.e., 

3 = 1 

M 

E P i P i+ak = ahfilr, \z\ = 1. (5) 

2= 0 

rp 

Further, suppose \I> = (^i, fa, • * * , ^(a 0 i)r) 1 is an orthogonal multiwavelets as- 
sociated with $ , Q(z) is two-scale matrix symbol, then 



M M 

^ ^ P{kJjZ')Q(cUjZ')^ O, ^ ^ Q (id j Z^Q (id j Z^ f(a01)r* (6) 

3= 1 i =1 

Eqs. (6) are equivalent to the following Eqs.(7), respectively, 

M M 

E r j 1 r j 1 

PiQi+ak = ^5 / v QiQi+ak = a ^0,k^(a<S>l)r • (7) 

2=0 2=0 

By using Lemma 1, we can easily prove Theorem 1 

rji 

Analogous to Hermite cardinal spline interpolation, &(x) = (fa, 02, • • • , fa) 1 
with common support is said to be interpolatory, if it satisfies the following 
condition: 



+ ko) = (ko)6 kfi e, 

^ 01 \k o )^O 



(1> 0, • • • , 0) T , • • • , e r = (0, • • • , 0, 1) T 

(8) 



Theorem 2 Let $(x) be a multiscale function with dilation a and multi- 
plicity r as in (1) and satisfy (8) for some positive integer fco(l < fa < [^r]),the n 
we have 



Pak+ko = 5k,oPk 0 , Pk 0 = diag(l , -, • 



proof Taking j — 1 derivatives to (1) and applying the interpolation con- 
dition (8), we have P ak +k 0 e 3 = ej, 1 < j < r, which implies (9). 

Theorem 3 Let $(x) = (fa, 02, • • * , fa) 1 be a multiscale function with 
dilation a as in (1) , P(z) be two-scale matrix symbol, if supp^ = [/q,#i], 1 < 
i < r, then 




160 Shouzhi Yang et al. 



(i) 02z®i are symmetric and 02 % antisymmetric for all j in the following 
sense (pi(x ) = (— 1 ) i(g)1 0*(/^ + gi — x),l < i < r if and only if the entries Pij of 
the matrix P(z) satisfy 

Pi,j(z) = (-1 y+iz a(hi+9i) ®^ +9 ^Pij(z), 1 < i, j < r (10) 

(ii) 0i, 02 ? • • • , 0n are symmetric, the remainder 0 ri+ i, • • • , 0 r are anti- 
symmetric in the sense <fii(x) = + gi — x),i = 1,2, •• -, 7*1, and <j>i(x) = 

—(j)i{hi + gi — x),i = ri,ri + l. - • ■ T r if and only if the entries Pij of the matrix 

P(z) satisfy 

{ zaihi+giMhj+grfp.'.^^ 1 < i, j < n Or n + 1 < i, j < V 

_ z a(hi+g i )®(h j +g j )p_^^ 1 < i < V\ and T\ + 1 < j < V (11) 

or r\ + 1 < i < r and 1 < j < ri 

(iii) If a(hi + gi) — ( hj + gj)( 1 < i, j < r) strictly is less than zero or isn’t 
an integer, then Pij = 0 

Proof If <t>! ■■ ,4>r satify ^(x) = (-1 ) t ® 1 4> i (h i + gi - x), 1 < i < 
r, let S r = diag(l, —1, • • • , (— l) r ), then &(x) = (<f>i(x), fcix), ■ ■ ■ , (f> r (x)) T = 

rp 

SV(0 i(/r + g i - ^),02(h 2 H - $2 x ) , • ,0r(^r + - a)) , hence, 

I $(c j) = S r D r (z a )W(cJ) 
y D r (z) = diag(z hl + 91 , z h<2+92 , • • • , z hr+9r ) 

Successively using (2), we obtain P(z)4>(^) = S r D r (z a )P(z)D r ('z)S r ^(^). Since 
{<; bt(x — k) : 1 < i < r, fc € Z} is a Riesz basis of Vp , so P(z) = S r D r (z a )P(z) 
D r (z)S r . Or equivalently, S r P(z)S r = D r (z a )P(z)D r (z ), which implies (10) 
holds. This completes the proof of Theorem 3 

Corollary 1 If supp0i = supp0 2 = • • • = supp0 r = [0, ^-], then 0 2 ;<g>i 
are symmetric and 0 2 i antisymmetric for all j if and only if P& = S r PM®kS r • 

In fact, since supp0i = [0, ^y], a(hi + gi) — ( hj + gj) = M, we obtain 
P(z) = z M S r P(z)S r by (10). Hence, Corollary 1 holds. 

As we know, for a multiscale function 3? (a?), if supp$(x) = [0, M], then 
supp$^(x) = [0, [^-]], where $^(x) = [&^-(ax), $^(ax — 1), • • • , (ax — a + 

m 

1)] 1 . Hence, without loss of generality, we only investigate the construction of 

rp 

multiwavelets with a + 1-coefficient, i.e., <f> L (x) satisfies the following equation 

a 

<f>(x) = J2 p k$(ax-k) (12) 

k=0 

In the applications of multi wavelets, certain special properties is desirable 
, such as interpolating and symmetry. In the two-scale matrix sequence {P/c}, 
associated with those multiwavelets with these properties , there must exists 
some Pi,0 < i < a such that the matrix (al — Pi P^f )® 1 Pi P^~ is a positive 
definite matrix 




Orthogonal Multiwavelets with Dilation Factor a 



161 



Lemma 4 Let <!>(#) be the orthogonal compactly supported multiscale 
function with dilation a and multiplicity r satisfing (12), Assume that there 
exists an P^, 0 < i < a such that the matrix H defined in following equation is a 
positive definite matrix 



H 2 = ( al r - P i P i T )® 1 P i P- T , (13) 

Let H s (s = 1,2, • • • ,a — 1) be (a — 1) essentialy different symmetric matrices 
satisfing (13), define = H s Pj(j ^ i), and = — iL® 1 P J (j = i),her ej = 
0, 1, • • • , a; s = 1, 2, • • • , a — 1. then 

Po(<zi s) ) T = O, (14) 

P 0 (^ S) ) T + Pi(q[ s) ) T + • • • + Pa(q { a s) ) T = O (15) 

(^ ) )( 9 i s) ) T = 0, s = 1, 2, • • • , a — 1 (16) 

{qPXqPF + (P)(P) T + • • • + ( q { a S Wa S) ) T = air. (17) 



Proof For convenience, let i= 1. (14) and (16) can be proved easily by 
using (6). For (15) and (17), we have from (6) that 



E P M S) ) T = PoP?H s - PiP^iHf 1 ) + • • • + P a PjH s 

£=0 

= [PoPj + PlPj + • • • + P a Pj\Hs - PxPTiPs)® 1 
= [al r ~ PiP?]H, - P 1 P 1 T (^)® 1 
= [(air - PiP^ )(H S ) 2 - PiPT]^)® 1 = o 

E4 S) (P) T = HsPoP?H s + (P^PiPpfl,)® 1 + • • • + H s P a pjH s 

£=0 

= P S [P 0 P 0 T + p 2 p 2 t + ■ • ■ + P a pJ}H s + (H S )® 1 P 1 P^(H S )^ 

= H s [al r - PP, T ]P S + (P s )® 1 PiP 1 T (P s ) 01 
= (H s )® 1 [(H s ) 2 (aI r - PiP?)(Hsf - Pip]^)® 1 
= (P s )® 1 [(P s ) 2 PiPi T + PiPi T ](^)® 1 = (H S )^[(H S ) 2 + I r ]P 1 P 1 T (P s )® 1 
= P s [PiP! T + (P s )® 2 P 1 P 1 T ](P,)® 1 = H s aI r (H s ) m = al r 



This completes the proof of Lemma 4. 

In the setting of Lemma 4, we can generate a — 1 sequences = 

1, 2, • • • , a — 1. We construct the following functions in terms of these sequences, 

a 

ips(x) = E4 S) ^H a:c - s = 1, 2, •■■■■, a - 1. 
fc= o 



( 18 ) 




162 Shouzhi Yang et al. 



Appling Schmidt orthonormalizing to a functions 4>(ic), 'ip s (x), s = 1,2, 
• • • , a — 1, and generating a functions 4>(at), & s ( % ), s = 1, 2, • • • , a — 1, we can 
conclude that there must exist a — 1 sequences {Q^}, s = 1, 2, • • • , a — 1, such 
that 

a 

4> s (ir) = ^2Q^&(ax - k),s = 1,2, • • • , a - 1. (19) 

k=0 

Hence, we have the following theorem: 

Theorem 4 In the setting of Lemma 4, let 4> s (:r), 8 = 1, 2, • • • , a — 1 be 
defined as in (19). Define 4>(:r) = [^(x)^, 4 / 2 ( 2 ;)^, • • • , then 4/ (at) 

is compactly supported orthogonal multiwavelets with dilation a associated with 
$(x) , and satisfies the following two-scale matrix equation 

*(*) = D<Ql ,, ) T , (Qf ) T , ■ ■ ' , (Q«““») T ] T *(ax - k) (20) 

k = 0 



Corollary 2 In the setting of Lemma 4, (i) If dilation factor a = 2 , then 
J>i(x) defined in (18) is compactly supported orthogonal multiwavelets with 
dilation 2 associated with <&(#); (ii) If dilation factor a = 3, and 4> s (ir) = 

3 

Q^&(ax — k),s = 1,2 . Let 'f'(x) = [^i(x)^, ^(x)^]^, then Sfr(x) is 

k = 0 

compactly supported orthogonal multiwavelets with dilation 3 associated with 
4»(x), and satisfies ( 20 ) in which , 



Qk 1] = 9fc 1} 

fr =0 



(21) 



4 Example 



We will illustrate by a specific example how to construct orthogonal multi- 
wavelets based on our method. 

Example (Construction of orthogonal multiwavelets with dilation 3 and 
multiplicity 3) 

Let &(x) = (0i , 02, 03)^, satisfy &(x) = Po4»(3x) +Pi^(3x — 1) +P 2 ^(3x — 
2). By Lemma 2 ,supp4>(:r) C [0,1]. Suppose both 0 1 and 0 3 are symmetric and 
02 is antisymmetric, 4>(x) satisfies the interpolatory condition ( 8 ) with fc 0 = 1 , 
then in view of (9), taking i = 1 and using Theorem 4, we obtain 



1 


1 

^2 


JO 




-y/2 


0 


0 


26 

52 


182 

156 


^26 


0 (1) - 
5 Vl ~ 


0 


^26 

3 


0^ 


0 


0 


2 

18 




0 


0 


11 2 

9 J 



1 


1 


to 

1 

1 




1 


1 


JO 


<-2 


<-► 2 


o (2) - 

5 % ~ 




^2 


26 


182 


26 


26 


182 


^26 


52 


156 


-426 


52 


156 


26 > 


0 


0 


2 

18 




0 


0 


2 

18 




Orthogonal Multiwavelets with Dilation Factor a 



163 





-V2 


0 


0 




" 1 

<-2 


1 

2 


0 


(2) 

q\ = 


0 


26 

3 


0^ 


, <u = 


26 

52 


182 

156 


*"~26 

as 




0 


0 


11 2 
9 J 




0 


0 


2 

18 



Finally, we obtain orthogonal multiwavelets by (20) and (21). 



References 

1. Geronimo,J., Hardin, D. P., Massopust,P.: Fractal Functions and Wavelet Expan- 
sions Based on Several Scaling Functions. J. Approx. Theory. 78(1998) 373-401 

2. Chui,C. K., Lian,J.: A Study on Orthonormal Multiwavelets. J. Appl. Numer. 
Math., 20(1996) 273-298 

3. Lian,J.: Orthogonal Criteria for Multiscaling Functions. Appl. Comp. Harm. Anal. 
5(1998) 277-311 

4. Hardin, D. P., Marasovich,J. A.: Biorthogonal Multiwavelets on [-1,1]. Appl. Comp. 
Harm. Anal. 7(1999) 34-53 

5. Goh,S. S., Yap,V. B.: Matrix Extension and Biorthogonal Multiwavelets Construc- 
tion. Linear Algebra and Applictions. 269(1998) 139-157 

6. Marasovich,J.: Biorthogonal Multiwavelets, Dissertation, Vanderbilt University, 
Nashville, TN, (1996) 

7. Daubechies,I.: Ten lectures on wavelets, SIAM, Philadelphia, PA, (1992) 

8. Donovan, G. C., Geronimo,J., Hardin, D. P.: Construction of Orthogonal Wavelets 
Using Fractal Interpolation Functions. SIAM J. Math. Anal. 27(1996) 1158-1192 

9. Wang So, Jianzhang Wang, Estimating the Support of a Scaling Vector. SIAM J. 
Matrix Anal. Appl. 1(1997) 66-73 




A Wavelet-Based Image Indexing, Clustering, 
and Retrieval Technique Based on Edge Feature 



Masaaki Kubo 1 , Zaher Aghbari 1 , Kun Seok Oh 2 , and Akifumi Makinouchi 1 

1 Graduate School of Information Science and Electrical Engineering, Department of 
Intelligent Systems, Kyushu University 
6-10-1 Hakozaki, Higashi-ku, Fukuoka- shi 812-8581, Japan 
{kubo, zaher , akifumi}@db . is .kyushu-u. ac . jp 
2 Division of Computer Engineering College of Engineering Chosun University 
375 Susuk-dong Dong-gu Kwangju 501-759 Korea 
okseak38@hotmail . com 



Abstract. This paper proposes a technique for indexing, clustering and 
retrieving images based on their edge features. In this technique, images 
are decomposed into several frequency bands using the Haar wavelet 
transform. From the one-level decomposition sub-bands an edge image 
is formed. Next, the higher order auto-correlation function is applied on 
the edge image to extract the edge features. These higher order autocor- 
relation features are normalized to generate a compact feature vector, 
which is invariant to shift, image size and gray level. Then, these feature 
vectors are clustered by a self-organizing map (SOM) based on their 
edge feature similarity. The performed experiments show the high preci- 
sion of this technique in clustering and retrieving images in a large image 
database environment. 



1 Introduction 

In the past decade, the number of digital images has increased tremendously 
due to the steady growth of computer power, decline of storage cost, and rapid 
increase in access to the Internet. Therefore, fast and effective methods to or- 
ganize and search images in large image database environments are essential. In 
particular, images need to be effectively clustered and then fast content-based 
mechanism is required to retrieve desired images. 

Currently, two main indexing approaches exist: (1) indexing images based on 
features from raw image data [1] [2], such as pixel intensity, histogram, etc. (2) 
indexing images based on coefficients in the transform domain [3] [4] [5] [6] , such as 
total energy of wavelet coefficients. These extracted features are represented by 
means of a feature vector , which is a compact representation of the image content. 
These feature vectors are then organized by a spatial access method (SAM), 
such as B-tree, R-tree, etc., or a clustering method, such as self-organizing map 
(SOM) [7]. When a query Q, such as ” Find images similar to Q’\ is issued, Q is 
compared with the database of feature vectors that represent the image database. 
As a result, the K most similar images to Q are returned to the user. 

Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 164 176, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique 165 



This paper presents a Haar wavelet-based technique that extracts the edge 
features from an edge image, which is generated from the one-level decomposition 
sub-bands of an image, by means of the higher order autocorrelation method. 
Since image in large databases are found in different sizes and gray levels, it is 
essential to adopt the extracted features to tolerate such differences, an impor- 
tant property lacked in previous work [3] [4]. Thus, in this work, the extracted 
higher order autocorrelation features are normalized; as a result, they become 
invariant to shift, image size and gray level. The normalized features of an image 
are combined into a compact feature vector (25 feature values). Then, the fea- 
ture vectors of all images are clustered by a SOM method. The system supports 
query -by -example access to the images. 

The rest of this paper is organized as follows: the related work is surveyed in 
Sect. 2. In Sect. 3, we present the system architectur. The indexing and clustering 
technique of our system is discussed in Sect. 4. Then, the querying method and 
experimental results are discussed in Sect. 5. Finally, we conclude the paper in 
Sect. 6. 



2 Related Work 

An example of indexing based on raw image data is the (QBIC) system [1] of 
IBM that indexes images on multiple features, such as color histograms, texture, 
shapes, etc. Although such multiple features provides an effective representation 
of an image, they are computationally expensive during both the index compu- 
tation phase and the query processing phase. Another example of indexing based 
on raw image data is the VisualSEEK system [2] which indexes each image in 
the database by its salient color regions. 

For indexing based on the transformed-domain coefficients, Wang et al. [5] 
have proposed a wavelet-based image indexing and searching (WBIIS) algorithm. 
In the WBIIS project, Daubechies’ wavelet transform are employed to produce 
color feature vectors that provide better frequency localization than other tradi- 
tional color layout coding algorithms, as argued by the authors. Another example 
of indexing based on the transformed-domain is proposed by Jacobs et al. [8] in 
which an image searching algorithm that makes use of the multiresolution Haar 
wavelet decompositions of images is presented. 

In large image databases, it is essential to organize and/or classify feature 
vectors into different clusters to speed up the search. This organization and/or 
clustering of images is based on the similarity of feature vectors of images. Here 
we introduce some examples that utilize such algorithms. Albuz et al. [3] have 
proposed an algorithm to cluster the feature vectors, which represent images, in 
a modified k order B-tree data structure, where k is the maximum number of 
clusters. This approach have utilized the multiresolution property of the wavelet 
transform to compute the feature vectors. The problem with this approach is 
that the number of clusters have to be decided by the user before inserting keys 
into the B-tree. Oja et al. [9] have introduced the PicSOM system to cluster 
images based on a Tree Structured Self-Organizing Maps (TS-SOMs). The TS- 



166 



Masaaki Kubo et al. 



SOM is a tree-structured vector quantization algorithm that uses SOMs [7] at 
each of its hierarchal levels. However, since the SOM algorithm is not scalable to 
new classes, if a new class of images is to be inserted into the database the TS- 
SOM, which is a hierarchy of SOMs, has to undergo a computationally expensive 
process of retraining at each hierarchy. 

3 System Architecture 

The basic architecture of our system is shown in Fig. 1. The solid arrows show the 
sequence of processes of indexing and clustering images and the dashed arrows 
follow the sequence of processes of querying. As shown in Fig. 1, both the images 
to be indexed and the query image go through the same sequence of processes. 
However, in case of indexing and clustering, after the SOM-Based Clustering 
process the feature vector of an image is added to the corresponding cluster in 
the database. In case of querying, the images associated with the best matching 
node (cluster) to the query image are returned to the user. A detailed discussion 
of these processes is in Sect. 4. 

4 Indexing and Clustering 

As shown in Fig. 1, the system applies several processes on an image to index 
and cluster it. In this Sect., we discuss these processes. 

4.1 Haar Wavelet Transform 

The wavelet transform describes the image in terms of a coarse overall shape, 
plus some details that range from broad to narrow. The Haar wavelet transform 



Images to be 
indexed and 
clustered 




of images DB of clustered 

feature vectors 



Fig. 1. Basic architecture of a system: The solid arrows show the path of image 
indexing and clustering. The dashed arrows show the path of image querying 



A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique 167 



w 



Original 

Image 



(a) 



, w/2 * 



LL 


HL 


LH 


HH 



(b) 



=F 




HL2 








HLi 


LH2 


HH2 




LHi 


HHi 



(c) 



Fig. 2. Wavelet multiresolution property of an image: (a) represents original 
image, (b) a one-level decomposition produces 4 sub-bands, namely LL, LH, HL 
and HH, (c) a four- level decomposition produces 13 sub-bands 



is applied iteratively on an image to generate multi-level decomposition (see 
Fig. 2). At level l decomposition, 3/ + 1 sub-bands are produced. In a large 
image database environment, it is essential to represent images by a method 
that supports the following requirements on a feature vector: 

(1) Compact , 

(2) Fast to compute , and 

(3) Supports similarity retrieval. 

Therefore, we are using a Haar wavelet transform to decompose images into 
several frequency bands and then compute a feature vector from these bands. 
The above requirements are satisfied as follows: 

1. Compact: By making use of the wavelet multiresolution property, we can 
decompose an image and then use only a few coefficients to represent the image 
content sufficiently. As shown in Fig. 3, the Haar wavelet transform decomposed 
the original image (see Fig. 3. a) into four sub-bands: LL, LH, HL and HH (see 
Fig. 3.b). The Haar wavelet coefficients, Haar basis and coefficient details , are 
computed by Equations 1 and 2, respectively. 




(•) 




Fig. 3. An Example of wavelet decomposition: (a) original image, (b) one-level 
decomposition 





168 



Masaaki Kubo et al. 




(i) 



( 2 ) 



Where, x(n) and x(n + 1) are the current and next values of an image, 
respectively. The LH and HL sub-bands are used to generate an edge image 
(see Subsection 4.2). Then, we use the higher order autocorrelation function 
to extract the edge features from the edge image (see Subsection 4.3). Only a 
few (25 coefficients) of the extracted higher order autocorrelation coefficients 
are used to produce a feature vector. Thus, the feature vector of an image is 
compact. 

2. Fast to Compute: The Haar wavelet basis is the simplest wavelet basis, 
in terms of implementation, and the fastest to compute [6] [8]. From Equation 1, 
we notice that the Haar wavelet transform is mathematically equivalent to the 
averaging of color blocks [5]. Because Haar wavelets are fast to compute, they 
become a key to several applications such as data compression, data transmis- 
sion, denoising, and edge detection. However, one drawback of Haar basis for 
lossy compression is that it tends to produce blocky image artifacts for high 
compression rates [8]. However, in our application, the result of compression is 
never viewed; therefore, these artifacts do not affect our indexing and querying 
processes. 

3. Supports Similarity Retrieval: Similarity retrieval is preferred in image 
databases because users can simply select an image that is similar to the wanted 
image and then issue a query 'Find images that are similar to this query image'. 
Or, a user can simply make a rough sketch, such as the dominant edges, of a 
wanted image and issue a query 'Find images that are similar to this sketch '. 
To achieve this goal, we use only 25 normalized higher order autocorrelation 
coefficients to represent the image. These 25 coefficients sufficiently approximate 
the image and provide some margin for similarity retrieval. 

4.2 Edge Image Construction 

From a signal processing point of view, the wavelet transform is basically a 
convolution operation, which is equivalent to passing an image through low-pass 
and high-pass filters. Let the original image be J(u;,h), then the LH sub-band 
represents the vertical edges and HL sub-band represents the horizontal edges 
of I(w,h). Using these properties of the LH and HL sub-bands, we construct 
an edge image. If an element in the LH sub-band is and an element in the 
HL sub-band is h m?n , then the corresponding element e m?n in the edge image is 
given by Equation 3. Let w and h be the width and height, respectively, of the 
LH and HL sub-bands. 




(3) 



A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique 169 



Where, 1 < m < w and 1 < n < h. Fig. 4 shows the constructed edge image 
from the LH and HL sub-bands of Fig. 3.b using Equation 3. In our system, we 
use the LH and HL sub-bands of the one-level decomposition because they have 
more detailed information of the dominant edges in the original image. 




Fig. 4. Constructed edge image from the LH and HL sub-bands of Fig. 3.b 



4.3 Feature Vector Generation 

The higher order autocorrelation features are the primitive edge features that 
we use to index and retrieve images. Such features are shift-invariant (irrelevant 
to where the objects are located in the image), which is a useful property in 
image querying. 

As defined in [10] and [11], let an image plane be P and a function I(r) 
represents an image intensity function on the retinal plane P such that r G P. 
That is, r is the image coordinate vector. A shift (translation) of I(r) within P is 
represented by /(r + a^), where aq is the displacement vector. Therefore, the TVth- 
order autocorrelation functions with N displacements a±, ...,a/v are defined by 

R N (ai, a N ) = ^2 1(r)I(r + a 1 )...I(r + a N ) (4) 

p 

It is obvious from Equation 4 that the number of autocorrelation functions 
obtained by the possible combinations of the displacements over the image plane 
is large. Therefore, it is essential to reduce this large number for practical ap- 
plications. Here, we limit the order N up to 2 (N = 0, 1,2). Also, the range of 
displacements is limited to within a local 3x3 window, of which the center is 
the reference local point. 

The local mask pattern for extracting higher order autocorrelation features 
is shown in Fig. 5. The Oth-order autocorrelation function corresponds to the 
average gray level of the image I(r). By eliminating the displacements that are 
equivalent by shift, the number of unique patterns is reduced to 25 as shown in 
Fig. 5. Using these mask patterns, the feature vector f v that contains the higher 
order autocorrelation functions is defined as follows: 




170 



Masaaki Kubo et al. 



N-0 | 




N = 1 HI 


33 ffl H 


88 




80 




m 


mmm 


Ml 


HIWILHJ 


r 


iwim rw 



Fig. 5. The 25 Local mask patterns for extracting higher order autocorrelation 
features, where the order N is limited to 2 



r = /i,-,/25 (5) 

Let the position of the mask pattern in the 3x3 window be denoted by x 
and y coordinates, such that I X:V denotes the mask pattern of the Oth-order 
autocorrelation function /i. Also, let the width and height of the edge image / 
be w and h. Thus, each fi is defined as: 



a = EEw 

x y 

f 2 = 'y^;(Ix,y)(Ix,y+l) 

x y 



<g>l,2/<g)l) 

x y 

f 6 = y ^(Ix,y)(Ix<g)l,y)(Ix+l,y) 

x y 



/25 = EEW(W 

a? y 

4.4 Feature Vector Normalization 

As mentioned in Sect. 4.3, the extracted higher order autocorrelation features 
(see Equation 5) are invariant to shift (location of objects in the image). However, 



A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique 171 



in large collections of images, such as images in the Internet, digital libraries, 

image archives, etc, images exist in different sizes and different intensities (gray 

levels). Therefore, it is important to design our features so that the search result 
should include the wanted image even if the selected image query (during the 
query by example process) is shifted, different in size, or different in gray level 
as compared with the wanted image. 

In addition to being invariant to shift, we consider the following essential 
requirements on features for practical image search: 

1. Features should be invariant to the size of an image. 

2. Features should be invariant to the gray level of an image. 

invariant to image size: For the first requirement, we divide the higher 
order autocorrelation functions by the width w and height h of the original 
image. As a result, the Feature values will not be proportional to the size of the 
original image, hence reducing the effect of the size difference between the query 
image and wanted image (see Equation 6). 

invariant to gray level: we notice from Equation 5 that the extracted 
values of the higher order autocorrelation functions are proportional to the order 
of autocorrelation. For example, say that the sum of values of gray level of the 
original image equal to S', then when the order N = 0 the gray level value of /i = 
S' 1 , and when N = 2 the gray level value of, say « S 3 . Therefore, we normalize 
the gray level values of the extracted higher order autocorrelation features by 
raising them to the power 1/AT, where N is the order of autocorrelation (see 
Equation 6). 



<{/I(r)I(r + a 1 )---I(r + a N ) (6) 

P 

For example, f 2 = ^ Y, x Y. v ViP^KPy+i)- The effect of normalizing the 
feature vectors are shown in Fig. 6 and Fig. 7. The two images in Fig. 6 are 
different in size and gray scale. Figures 7. a and 7.b show the feature vectors 
(higher order autocorrelation features) of the two images before and after the 
normalizing process, respectively. Hence, the normalization process of feature 
vectors brings similar images, but different in their sizes and gray levels, closer 
together, which is a useful property in similarity-based retrieval. After being 
normalized, the feature vectors are inserted into a SOM to be clustered. The 
next section introduces briefly the SOM-based clustering process. 



4.5 SOM-Based Clustering 

The self-organizing Map (SOM) [7] is unsupervised neural network that maps 
high-dimensional input data 5R n (in our case, normalized higher order auto- 
correlation features of an image) onto a usually two-dimensional output space 
while preserving the topological relations (similarities) between the data items. 
The SOM consists of nodes (neurons) arranged in a two-dimensional rectangu- 
lar or hexagonal grid. In our system, we simply arranged the SOM nodes in a 



172 



Masaaki Kubo et al. 




Fig. 6. An example to show the effect of normalizing feature vectors: the two 
images are different in size and gray scale 




Fig. 7. Effect of normalizing the feature vectors: (a)feature vectors of the two 
images in Fig. 6 before the normalizing process, (b) after the normalizing process 



2-dimensional rectangular grid. With every node i, a weight vector rrq e is as- 
sociated. An input vector x € is compared with m^, and the best-match- node 
(BMN), which has the smallest angle Obmn (see Equation 7), is determined. The 
input is thus mapped onto the location of the determined BMN. 



Obmn = arccos 



x m, 

ll x ii ii m iii 



} 



( 7 ) 



The reason we used angle 0 between vectors as a measure of distance rather 
than a simple Euclidean distance is illustrated in Fig. 8. The distances d(a, c) 
and d(b, c) between vectors are correctly expressed by the angles 6 ac and 0b c 
rather than the Euclidean distances ||a — c|| 2 and ||b — c|| 2 between the vectors, 
respectively. 

The weight vector m c of the BMN is adopted to match the input vector. 
That is done by moving m c towards x by a certain fraction of the angle Obmn- 
Moreover, the weight vectors of nodes in the neighborhood of the BMN are 
moved towards x, but to a lesser extent than the BMN. This learning process 
finally leads to a topologically-ordered mapping of the input vectors. That is the 
cluster structure within the data and the inter-cluster similarity is represented 



A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique 173 




Fig. 8. An example to illustrate the effectiveness of using the angle 6 between 
vectors instead of the 2-norm (Euclidean distance) as a measure of dissimilarity 
(distance) 



clearly in the map. The map is called a topological feature map and the weight 
vector held by a node is called a codebook vector. 



5 Querying and Results 

The programs of our system are written in C++. The database (images, code- 
book vector and clusters of images) is managed by the Jasmine-supported object 
database management system developed by FUJITSU and Computer Associates. 
Hence, we used Jasmine’s Weblink to build a user interface that supports query - 
by- example type of queries and create the HTML retrieval templates to display 
the results on a web browser. The experiments are performed on 620 images. 
The system runs on a spare Ultra-5_10, 270 MHz, 128 MBytes Sun Workstation. 



5.1 Querying 

Currently, our system supports query -by -example in which a user selects a query 
image Q that is most similar to the wanted image (s) from a set of displayed 
images. Then, Q undergoes the same sequence of processes described in Sect. 
4. Briefly, Q is decomposed, an edge image is generated, the higher order au- 
tocorrelation feature vector is extracted, normalized, and compared with the 
codebook vectors of all nodes of the SOM. Again, the BMN that is most similar 
(has the smallest Obmn , see Equation 7) to Q is determined. Finally, the images 
associated with the BMN are returned to the user for further manual browsing. 



5.2 Results 

To provide numerical results, we tested 7 sample queries chosen randomly from 
the image database. The result of each query Q is the set of images that are 
associated with the BMN, which is the most similar SOM node to Q. By exam- 
ining the SOM clusters, we found that the number of images associated with any 
of the SOM nodes is less than 15, which is small enough for manual browsing. 



174 



Masaaki Kubo et al. 




Fig. 9. Precision of the 7 sample queries and their average precision (dotted line) 

Table 1. Average time to determine the BMN and average precision of query 
results 





Average value 


Time to determine BMN 


4.96 seconds 


Precision of query results 


70.7% 



From the result of each sample query, we calculated the precision p of the query 
results. Since all the images clustered under, or associated with, the BMN are 
returned to a user as a result of Q, the computed p is also a measure of precision 
of the SOM-based clustering method. 

To compute the precision of query results, let Nt be the total number of re- 
turned images (images associated with BMN) and Nr be the number of relevant 
images in Nr. Then, the precision pi of the result of query qi is computed as 
follows: 




(8) 



Figure 9 shows the precision of the 7 sample queries and their average preci- 
sion (the dotted line). The average precision p is computed as follows: 



P=^T,P‘ < 9 > 

W i= 1 

Where, Nq is the total number of sample queries, which is equal to 7 in 
our test. As shown in Table 1, the average precision of query results is about 
70.7% (it is also a measure of precision of the SOM-based clustering method). 
We, also, measured the average query response time, which is equal to the time 
it take to determine the BMN of a query. Table 1 shows that the average query 
response time is equal to 4.96 seconds. Even though it is difficult to compare 
with other systems due to differences in computing environments, our average 



A Wavelet-Based Image Indexing, Clustering, and Retrieval Technique 175 



query response time is comparable to many systems such as [1][5][8] based on 
the recorded search time in the corresponding papers. 

6 Conclusion 

In this work, we have implemented a wavelet-based indexing and retrieval system 
that clusters images into a database and provides query-by-example access to 
the stored images. We showed that the edge feature is important in indexing 
and retrieving images. Our edge feature vector is compact, fast to compute 
and supports similarity retrieval. By normalizing the edge features (high order 
autocorrelation features), they become invariant to shift, image size and gray 
level, which are essential properties for similarity-based retrieval in large image 
database environments. Even though the system currently only supports query- 
by-example querying, it can be easily extended to support querying by sketch 
of dominant edges, which is a rough representation of the image. Based on the 
experimental results, the system shows a high search precision, on the average, 
which is due to the SOM-based clustering of similar images. Although most of the 
search time is spent in finding the BMN, the overall search time is comparable 
to many existing systems. 

References 

1. M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, 
J. Hafner, D, Lee, D. Perkovic, D. Steele, P. Yanker. Query by Image and Video 
Content: The QBIC System. IEEE Computer Magazine, Sept. 1995. 164, 165, 175 

2. J. R.Smith, S. F. Chang. VisualSEEK: A Fully Automated Content-Based Image 

Query System. ACM Multimedia Conference, Boston, pp. 87-98, Nov. 1996. 164, 

165 

3. E.Albuz. E.Kocalar, A. A.Khokhar. Scalable Image Indexing and Retrieval Using 
Wavelets. ICASSAP 1999. 164, 165 

4. M.Kobayakawa, M.Hoshi, T.Ohmori, T.Terui. Interactive Image Retrieval Based 
on Wavelet Transform and Its Application to Japanese Historical Image Data. IPSJ 
Trans, on , Vol.40, No. 3, pp. 899-911, March 1999. (In Japanese) 164, 165 

5. J. Z.Wang, G.Wiederhold, O.Firschein, S. X.Wei. Content-based Image Indexing 
and Searching Using Daubechies ’ Wavelets. Springer- Verlag Int’l Journal on Digital 
Libraries. Vol.l, pp.311-328, 1997. 164, 165, 168, 175 

6. A.Natsev, R.Rastogi, K.Shim. WALRUS: A Similarity Retrieval Algorithm for Im- 
age Databases. SIGMOD record, vol.28, no. 2, pp. 395-406, Philadelphia, PA, 1999. 
164, 168 

7. T.Kohonen. Self- Organizing Maps. Springer- Verlag, 1997. 2nd extended edition. 
164, 166, 171 

8. C. E. Jacobs, A.Finkelstein, D. H.Salesin. Fast Multiresolution Image Querying. 
Proc. of ACM SIGGRAPH, New York, 1995. 165, 168, 175 

9. E.Oja, J.Laaksonen, M.Koskela, S. Brandt. Self- Organizing Maps for Content- 
Based Image Database Retrieval. Published by Elsevier Science B. V., in Kohonen 
Maps, pp. 349-362. 1997. 165 

10. T.Kurita, N.Otsu, T.Sato. A Face Recognition Method Using Higher Order Local 
Autocorrelation And Multivariate Analysis. Prod, of 11th Int’l Conf. on Pattern 
Reconition, pp. 213-216, The Hague, 1992. 169 



176 



Masaaki Kubo et al. 



11. M.Kreutz, B.Volpel, H. Janssen. Scale- Invariant Image Recognition Based on 
Higher Order Autocorrelation Features. Pattern Recognition, Vol.29, No.l, pp.19- 
26, 1996. 169 



Wavelet Applications in Segmentation of Handwriting 
in Archival Documents 



Chew Lim Tan, Ruini Cao, and Peiyi Shen 

School of Computing, National University of Singapore 
Kent Ridge, Singapore 117543 
{ tancl , caorn, shenpy } @comp . nus . edu . sg 



Abstract. The National Archives of Singapore keeps a large number of 
double-sided handwritten archival documents. Over long periods of 
storage, ink sipped through the pages of these documents, resulting in 
interfering images of handwriting coming from the back of the page. 
This paper addresses this problem of segmenting handwriting from both 
sides of a document by means of a wavelet approach. We first match 
both sides of a document page such that the interfering strokes are 
mapped with the corresponding strokes originating from the reverse 
side. This allows the identification of the foreground and interfering 
strokes. A wavelet reconstruction process then iteratively enhances the 
foreground strokes and smears the interfering strokes so as to strengthen 
the discriminating capability of an improved Canny edge detector 
against the interfering strokes. Experimental results confirm the validity 
of the wavelet approach. 



1 Introduction 

Document image analysis is an important research area of image processing and pat- 
tern recognition [1]. As an essential step, traditionally, text extraction is the segmenta- 
tion of text from the background. But this paper introduces a rather different problem, 
that is, how to extract clear text strings from the seriously seeping, dominating, over- 
lapping and interfering images originating from the reverse side. This problem is faced 
by the National Archives of Singapore in restoring the original appearance of these 
valuable archival documents. Over long periods of storage of these documents, the 
seeping of ink has resulted in double images as shown in Fig. 1. Our task now is to 
segment the foreground handwriting from the interfering handwriting originating from 
the reverse side. Usually, the foreground writing appears darker than the interfering 
strokes. However, there are cases where the foreground and interfering writings have 
similar intensities, or worst still, the interfering strokes are more prominent than the 
foreground. 

At the request of the National Archives of Singapore, we first look for available 
methods in the literature [1][2] to solve this problem. First that came to our attention 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 176-187, 2001. 
Springer- Verlag Berlin Heidelberg 2001 




Wavelet Applications in Segmentation of Handwriting in Archival Documents 111 



was Negishi et al.[3]o® automatic thresholding algorithms which dealt with comparison 
of old manuscripts with printed matters on the internet. These algorithms, based on 
Otsuo® [4] method, extract the character bodies from the noisy background. Next, we 
found Liu and Srihari [5]o® thresholding algorithm which extracts characters from the 
run-length featured texture background based on the structure-stroke units of text and 
the distinguishable gray-level ranges between the characters and the background. 
Similar works were seen in Liang and Ahmadio® algorithm [6] which adopts a mor- 
phological approach to extract text strings from regular periodic overlapping 
text/background images. White and Rohrerc® [7] method may be more traditional. It is 
basically an image thresholding technique based on the boundary characteristics to 
suppress unwanted background patterns. Very similar work can be seen in Dono® work 
which segments the double-sided images based on the isolated gray-scale range of 
interfering images and the noise characteristics [8]. 

Methods surveyed thus far basically assume separable gray scale and/or distinctive 
features between the foreground and background. Our present problem however vio- 
lates these assumptions. Valuable work was further found in Lu et alo® contribution^]. 
Their method not only enhances the contrast of the edges in the low contrast area but 
also changes the intensity of the gray level of the edges. Lu also presents another 
wavelet method by decreasing the edge contrast and smearing the direct components 
of the edges with its neighboring pixels [10]. This appears to present an exciting ave- 
nue for comparing corresponding edges from both sides of a document page. How- 
ever, though his edge-based wavelet image preprocessing method can handle the 
change of the feature coefficients (local maxima)[l 1][12][13], it is found inadequate 
in meeting the following challenges in our present problem: (1) Due to the anisotropic 
absorption of the paper materials, the edges could be very different in shape and posi- 
tion between the interfering strokes appearing on the front and their corresponding 
originating strokes written on reverse side. (2) As a result, any mismatch between the 
interfering strokes observed on the front and their original strokes on the reverse side 
will result in a mistaken identity of interfering strokes as foreground edges. 

In view of the above, we have proposed an improved Canny detection method to 
suppress unwanted interfering strokes [14]. The orientation information from the 
canny detector is also used to favor foreground strokes that are predominantly slanting 
at an angle [15]. This paper further reports a wavelet approach to enhancing (i.e. 
sharpening) the foreground strokes and weakening (i.e. smearing) the interfering 
strokes in order to provide an even greater discriminating power of the improved 
Canny detector between the writings from both sides. Section 2 describes how both 
sides of a document are matched to identify foreground and interfering strokes as 
candidates for enhancement and smearing, respectively. Section 3 then provides the 
details of an iterative wavelet reconstruction process to progressively enhance and 
smear the respective components. Section 4 discusses how the resultant enhanced and 
smeared images provide the robustness of the Canny edge detector. Experimental 
results with images from the National Archives of Singapore are given in section 5 
followed by the conclusion and future works in section 6. 




178 Chew Lim Tan et al. 




(C) (d) 



Fig. 1 . Sample images: (a) front side of sample 1; (b) reverse side of sample 1; (c) front side of 
sample 2; (d) reverse side of sample 2 



2 Image Matching and Overlay 

It is observed that the interfering strokes are not as sharp as the normal strokes. Also, 
it is natural that weak foreground strokes may not necessarily sip into the reverse side 
(see Fig.l (d)). On the other hand, interfering strokes must have been originated from 
strong foreground strokes on the reverse side. Thus, we match both images from either 
side of a page by hand. To facilitate the ensuing wavelet operations, a sub-image of 
M N is taken from the whole image. The sub-images are reassembled in the final 
result. Let F{m,n) denote the k bits per pixel gray-scale front images, and B{m,n) the 
reverse side image of the same page, where m and n represent the line and the column 
respectively. An overlay operation is carried out as follows: 






Wavelet Applications in Segmentation of Handwriting in Archival Documents 179 



(a) Invert the reverse side image: 

invert(B(m,n )) ! 2 k 1 B(m,ri) (1) 

(b) Flip the inverted image and superimpose it on the front image such that corre- 
sponding strokes on either side are matched: 

A(m,n) ! flip(invert(B(m,n)))\/ F{m,n) (2) 

where flipQ means flipping the image horizontally resulting in its mirror image: 

flip(B(m,n)) ! B{m,N n ) (3) 

(c) Scale the resultant image: 

i A(m,ri) min(^) * 
ma x(A) min(^4) 

Fig. 2 shows the results of overlay processing. Comparing Fig.l and Fig.2, it can be 
seen that most of the interfering strokes have been weakened by the overlay process 
while the majority of the foreground strokes remain intact. These foreground strokes, 
though somewhat impaired, now serve as seeds to start the following enhancement and 
smearing processes. The idea now is to detect the foreground strokes on the front and 
enhance them using wavelets. The detected and binarized strokes [14] [15] from the 
foreground overlay image form what we call the enhancement feature image . At the 
same time, we detect the foreground strokes on the reverse side to locate their corre- 
sponding interfering strokes on the front so as to smear these interfering strokes by 
means of wavelets. The detected and binarized strokes [14] [15] from the reverse side 
overlay image result in what we call the smearing feature image . Iterative enhance- 
ment and smearing processes are then carried out on the original front image using the 
enhancement and smearing feature images to identify candidate strokes. 



(2* 1) 



(4) 




3 Iterative Wavelet Reconstruction 



Let j(x,y) be the original image, E(x,y) be the enhancement feature image and S(x,y) be 
the smearing feature image. The three sub-images have the same dimension M N. The 
enhancement and smearing features may be described as follows: 



# 0, background # 0, background 

E(x,y) ! v ;S(x,y)\ V ( 5 ) 

| 255 , detected stroke j 255, detected stroke 



The wavelet decomposition of flx,y) is written as follows [16], where j is the scale 
number of the wavelet decomposition: 




1 80 Chew Lim Tan et al. 



#C jf (m ,n) 
jj D ‘ / ( m , n ) 
\d j f (m ,n) 
p D)f(m,n ) 



(% & Jimt n(x>y) #) (ra ,„ )3 z 2 

i% fix, y), 3 (x, y) # ) 2 

(%f(x,y), 3 ) m #) (m \ a>z > 

(%f{x,y),3 7 3 , m ,„(x^)#) (mn)3z2 



~ * . F ' 't- * ~V. * / , 

y - 



ip-*' r* 

^5f ** 4.* ^ r 3 r" 




/V 

KJ 






l ^ 



y^> '* ^ 



n? 

✓ 




( 6 ) 






y > 



Fig. 2. Overlay results: (a) front side of sample 1; (b) reverse side of sample 1; (c) front side of 
sample 2; (d) reverse side of sample 2 



A 1 0-scale wavelet decomposition of original image J{x,y) may be described as 
follows: 



Wf(x,y) ! {CJ{x, y), D'j(x, y), D?J(x, y). If fix, y),..., 
D l 9 fix,y),D;fix,y),Dlfix,y)} 



( 7 ) 




Wavelet Applications in Segmentation of Handwriting in Archival Documents 181 



With the image wavelet representation Wf{x,y ), the enhancement feature E(x,y) and 
smearing feature S(x,y), the iterative wavelet reconstruction may be described as fol- 
lows: 

The multi-scale decomposition of the foreground. Unlike the traditional wavelet re- 
construction, the resultant image in each scale and iteration retains the same size as the 
original foreground sub-image, ie. M N. 

Wf(x,y ) ! {C 9 (x,y),D k j(x,y)J ! 0 ,..., 9 ,k ! 1 , 2 , 3 } ( 8 ) 

The magnitude of the wavelet coefficients in all the scales is revised by the en- 

k k 

hancement coefficient ej and smearing coefficient S- according to the algorithm in 

equation (12), where e* and Sj {e k #1, l#v^ #0, j ! 0,...,9 ,k \ 1,2,3,} are set 

empirically. The enhanced/smeared image f(x,y) is reconstructed from the modified 
coefficients. 

Do { if E(x,y) = = 255 D) (x, y) ! e) D) (x, y) ; 

if S(x,y) = = 255 D) (x, y) ! s) D) (x, y) ■ 

} while 0=0, , 9; k=l, 2, 3; x=0, ,N;y=0, ,N) (9) 

f \x , y) = inverse wavelet transform} 

{C 9 (x,y),Dj (x,y),j ! 0 ,..., 9 ,k \ 1 , 2 , 3 }) ( 10 ) 

To the reconstructed image f(x,y), do the wavelet transform again. Note that in 
obtaining the inverse of the wavelet transform, the revised D k obtained in equation 
(9) is used again. 

W((x,y) ’ {Cl(x,y),D?(x,y),j ! 0 ,..., 9 ,k ! 1 , 2 , 3 } ( 11 ) 

= inverse wavelet transform} 

{Ci(x,y),Dj (x,y),j ! 0 ,..., 9 ,k\ 1 , 2 , 3 }) ( 12 ) 

Iteratively process the wavelet decomposition and reconstruction using equation 
(11) and (12), we could get the final enhanced/smeared gray-scale image. 

Clip the final enhanced/smeared image using the following function: 

# 0 ,f\x,y)°A 0 

f\x, y) ! v255,/’(x,j)#! 255 (13) 

?/’( x,jp), otherwise 



In our implementation, we used wavelet transform up to 10 scales for images of 
size 512x512. In the reconstruction process, we set 15 to be the maximum number of 




1 82 Chew Lim Tan et al. 



iterations. Fig. 3(a) and (d) show the results after 15 iterations. After the final en- 
hancement and smearing, the resultant image containing both the enhanced and 
smeared features is then processed by the improved Canny edge detection [14] [15]. 
As the Canny edge detector favors sharper edges over smoother ones, the above en- 
hancement and smearing processes strengthen this discriminating capability of the 
Canny detector. The detected edges are used as loci to recover from the original im- 
age within a 7 7 window centered at each detected edge point. The recovered gray 
level images are shown in Fig. 3(b) and (e), respectively. The Niblacko® threshold [17] 
is then adapted to binarize the image to give a clear readable copy for the reader as 
shown in Fig. 3(c) and (f). 




Fig. 3. Enhancement/smearing and segmentation results: (a) enhanced/smeared front side of 
sample 1; (b) segmentation results of (a); (c) binarization results of (b); (d) enhanced/smeared 
front side of sample 2; (e) segmentation results of (d); (f) binarization results of (e) 



4 Robust Threshold Decision in Canny Edge Detector 

The edge strength images of the original front images and their enhanced/smeared 
images are shown in Fig. 4(a), (b) and (c), (d), respectively where the magnitude of 
the gradient is converted into the gray level value. We can see that the darker the edge 
is, the larger is the gradient magnitude. It is obvious from Fig. 4(a) and (b) that with- 
out the enhancement/smearing processes, the edge strength of strong interfering 
strokes are similar to that of the foreground strokes. Thus it is difficult to set a univer- 




Wavelet Applications in Segmentation of Handwriting in Archival Documents 1 83 



sally valid pair of dual thresholds for the Canny edge detector in conventional meth- 
ods. This is especially so in view of the great variety of the relative strengths between 
the foreground and interfering strokes among these archival documents. In fact, it is 
sometimes even impossible to set one single set of thresholds for the same page due to 
the variation of strokes intensity across of the page. On the other hand, from Fig. 4(c) 
and (d), it is seen that the enhancement/smearing processes have significantly high- 
lighted the foreground strokes against the interfering strokes. 



(a) 



(b) 



(c) (d) 

Fig. 4. Magnitude of gradient of detected edges: (a) front side of sample 1 before enhance- 
ment/smearing; (b) front side of sample 2 before enhancement/smearing; (c) front side of sam- 
ple 1 after enhancement/smearing; (b) front side of sample 2 after enhancement/smearing 

The enhancement and smearing processes work with each other to the advantage of 
our desirable results. Generally, a lower value for the Canny detectorc® upper thresh- 
old is adopted to detect as many features as possible. The enhancement feature image 
may have erroneously picked up interfering strokes as enhancement features, resulting 
in a noisy enhancement feature image. However, with the same lower threshold value, 
more smearing features will also be included in the smearing feature image. Some of 
the smearing features will be in the overlapped areas (partially of fully) with the 
falsely identified strokes in the enhancement feature image. As a result, these false 








1 84 Chew Lim Tan et al. 



alarms will be eventually suppressed by the subsequent smearing process. The novelty 
of this property is that as long as a smearing feature covers any part of a mistaken 
enhancement feature, this false positive will eventually be smeared away. The un- 
wanted strokes will finally be sifted out by the cancellation effect of the smearing 
process. This collaborative nature between enhancement and smearing makes our 
method robust to the threshold setting. In other words, unlike the conventional edge 
detection, the final detection of the foreground strokes from the enhanced/smeared 
image using our approach is not so sensitive to the threshold value. 



5 Experiment Results 

Over 200 scanned images of historical handwritten documents from the National Ar- 
chives of Singapore were tested in our experiment. These images were scanned at 150 
dpi and saved as TIF format without compression. Most of the images are moderately 
noisy and were satisfactorily cleaned up. To assess the performance of our method 
especially for difficult cases, 12 severely interfering images were selected for evalua- 
tion. The selected images were visually inspected to assess the readability of the ex- 
tracted words. Fig.5 shows all the 12 sample images in cut off strips and the final 
binary segmentation while Fig. 6 gives a full view of one of the images and its final 
result. The well-known Information Retrieval measures, precision and recall (defined 
below), are used to measure the performance of the proposed method [18]. 

. . , No. of Correctly Detected Words ^ 

Precision ! (14) 

No. of all Words detected by the System 

^ ^ j No. of Correctly Detected Words 

Total No. of Words present in the Document 

Precision reflects the performance of removing the interfering strokes and recall re- 
flects the performance of restoring the foreground words. The results in Table 1 show 
average precision and recall rates of 84% and 96% respectively. 



Table 1 . Evaluation of the proposed method 



Image no. 


1 


2 


3 


4 


5 


6 


7 


Total 

words 


132 


124 


103 


125 


125 


123 


121 


Precision 


91% 


86% 


76% 


94% 


92% 


80% 


79% 


Recall 


98% 


100% 


90% 


94% 


98% 


94% 


89% 


Image no. 


8 


9 


10 


11 


12 




Average 


Total 

words 


128 


112 


113 


114 


114 






Precision 


75% 


84% 


82% 


91% 


78% 




84% 


Recall 


97% 


97% 


98% 


96% 


96% 




96% 































































Wavelet Applications in Segmentation of Handwriting in Archival Documents 1 85 



Image 1 
Image 2 
Image 3 
Image 4 
Image 5 
Image 6 
Image 7 
Image 8 
Image 9 
Image 10 
Image 1 1 
Image 12 

Fig. 5. Sample images in Table 1 and their final binarization results 

6 Conclusion 

The problem of interfering images of handwriting from the reverse side of a archival 
document due to the sipping of ink has been solved by our wavelet-based segmenta- 
tion method. The enhancement/smearing algorithm presented in the paper performs 
well even for cases containing weak foreground strokes among strong interference. 






1 86 Chew Lim Tan et al. 



The whole system including the improved Canny edge detector [14] [15] is able to 
segment the foreground writing from the interfering strokes effectively. One problem 
encountered presently is in getting a perfect manual overly between the front and 
reverse side images due to differences between both images caused by factors like 
document skews, different scales during image capture, and warped surfaces at booksoo 
spine areas. A future work for us is to develop a computer-aided overlay process to 
take over the present manual image matching. 



Acknowledgement 

This project is jointly supported by the National Science and Technology Board and 
the Ministry of Education, Singapore, under the joint research grant R-252-000-071- 
112/303. The provision of archival documents by the National Archives of Singapore 
is gratefully acknowledged. 




Fig. 6. One whole original page image and its final binarization results 



References 

1. Nagy, G.: Twenty Years of Document Image Analysis in PAMI. IEEE Trans. 
PAMI, Vol. 22, No. 1, Jan. 2000, 38-62 

2. Casey, R.G., Lecolinet, E.: A Survey of Methods and Strategies in Character 
Segmentation. IEEE Trans. PAMI, Vol.20, No. 7, July 1996, 690-706 

3. Negishi, EL, Kato, J., Hase, EL, Watanabe T.: Character Extraction from Noisy 
Background for an Automatic Reference System. In: Proc. 5th Int. Conf. Docu- 
ment Analysis and Recognition, Bangalore, India, Sept. 1999, 143-146 




Wavelet Applications in Segmentation of Handwriting in Archival Documents 1 87 



4. Otsu, N.: A Threshold Selection Method from Gray-Level Histograms. IEEE 
Trans. System, Man, and Cybernetics, Vol. 9, No. 1, 1979, 62-66 

5. Liu, Y., Srihari, S.N.: Document Image Binarization Based on Texture Features. 
IEEE Trans. PAMI, Vol. 19, No. 5, May 1997, 540-544 

6. Liang, S., Ahmadi, M.: A Morphological Approach to Text String Extraction 
from Regular Periodic Overlapping Text/Background Images. Graphical Models 
and Image Processing, CVGIP, Vol. 56, No. 5, Sept. 1994, 402-413 

7. White, J.M., Rohrer, G.D.: Image Thresholding for Optical Character Recogni- 
tion and Other Applications Requiring Character Image Extraction. IBM J. Res. 
Dev. 27(4), 1983,400-410 

8. Don, H-S.: A Noise Attribute Thresholding Method for Document Image Binari- 
zation. In: Proc. 3rd Int. Conf. Document Analysis and Recognition, 1995, 231- 
234 

9. Lu, J., Healy, D.M., Weaver, J.B.: Contrast Enhancement of Medical Images 
Using Multi-scale Edge Representation. Optical Engineering, 33(7), 1994, 2151- 
216110. 

10. Lu, J.: Image De-blocking via Multi-scale Edge Processing. In: Unser, M.A., 
Aldroubi, A., Laine, A.F. (eds.): Proc. of SPIE, Wavelet Applications in Signal 
and Image Processing IV, Vol. 2825, Part two, Denver, Colorado, Aug. 1996, 
742-75. 

11. Mallat, S., Zhong, S.: Characterization of Signals from Multi-scale Edges. IEEE 
Trans. PAMI, Vol. 14, No.7, July 1992, 710-732 

12. Hwang, W.L., Chang, F.: Character Extraction from Documents Using Wavelet 
Maxima. In: Unser, M.A., Aldroubi, A., Laine, A.F. (eds.): Proc. of SPIE, 
Wavelet Applications in Signal and Image Processing IV, Vol. 2825, Part two, 
Denver, Colorado, Aug. 1996, 1003-1015 

13. Etemad, K., Doerman, D., Chellappa, R.: Multi-scale Segmentation of Unstruc- 
tured Document Pages Using Soft Decision Integration. IEEE Trans. PAMI, Vol. 
19, No. 1, Jan. 1997,92-96 

14. Cao, R., Tan, C.L., Wang, Q., Shen, P.: Segmentation and Analysis of Double- 
Sided Handwritten Archival Documents. In: Proc. 4th I APR Int. Workshop on 
Document Analysis Systems, Rio de Janeiro, Brazil, Dec. 2000, 147-158 

15. Tan, C.L., Cao, R., Shen, P., Chee, J., Chang, J.: Removal of Interfering Strokes 
in Double-Sided Document Images. In: Proc. 5th IEEE Workshop on Applica- 
tions of Computer Vision, Palm Springs, California, Dec. 2000, 16-21 

16. Feng, L., Tang, Y.Y., Yang, L.H.: A Wavelet Approach to Extracting Contours of 
Document Images. In: Proc. 5th Int. Conf. Document Analysis and Recognition, 
Bangalore, India, Sept. 1999, 71-74 

17. Niblack, W.: An Introduction to Digital Image Processing. Englewood Cliffs, 
N.J., Prentice Hall (1986) 115-116 

18. Junker, M., Hoch R., Dengel, A.: On the Evaluation of Document Analysis Com- 
ponents by Recall, Precision, and Accuracy. In: Proc. 5th Int. Conf. Document 
Analysis and Recognition, Bangalore, India, Sept. 1999, 713-716 




Wavelet Packets 

for Lighting-Effects Determination 



Abbas Z. Kouzani and S. H. Ong 

School of Engineering and Technology, Deakin University 
Geelong, Victoria 3217, Australia 



Abstract. This paper presents a system to determine lighting effects 
within face images. The theories of multivariate discriminant analysis 
and wavelet packets transform are utilised to develop the proposed sys- 
tem. An extensive set of face images of different poses, illuminated from 
different angles, are used to train the system. The performance of the 
proposed system is evaluated by conducting experiments on different 
test sets, and by comparing its results against those of some existing 
counterparts. 



1 Introduction 

The appearance of a person is highly dependent on the lighting conditions. Often 
slight changes in lighting produce large changes in the person’s appearance. Since 
the face images in the known face database are taken under front-lit lighting, 
recognition of a face image taken under a different lighting condition becomes 
difficult. Determining the lighting effects within a face image is therefore the 
first crucial step of building a lighting invariant face recognition system. 

While there has been a great deal of literature in computer vision detailing 
methods for face recognition, few efforts have been devoted to image variations 
produced by changes in lighting. In general, recognition algorithms have either 
ignored lighting variation, or dealt with it by measuring some properties or 
features of the image which are at least insensitive to the variability. Yet, features 
do not contain sufficient information necessary for recognition. Furthermore, 
faces often produces inconsistent features under different lighting conditions. 

In this paper, a hybrid method is proposed based on theories of multivariate 
discriminant analysis and wavelet packets transform to classify face images based 
on the lighting effects present in the image. An extensive set of face images of 
different poses, illuminated from different angles, are used in the training of the 
system. 

The paper is organised as follows. In Section 2, the existing work is reviewed. 
Section 3 presents the lighting-effects determination system. In Section 4, the 
experimental results are presented and discussed. Finally, the concluding remarks 
are given in Section 5. 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 188 199, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



Wavelet Packets for Lighting- Effects Determination 



189 



2 Review of Existing Methods 

To handle image variations that are due to lighting, three main methods have 
been used in the literature. These methods, used by object recognition systems 
as well as by systems that are specific to faces, are explained below. 



2.1 Shape from Shading 

The shape-from-shading method [1] utilises the gray-level information to deter- 
mine the 3D shape of the object. Most algorithms which attempt to determine 
shape-from-shading, are designed for images of arbitrary objects with smooth 
brightness variations [1]. These algorithms estimate shapes from the limited in- 
formation contained within an image. However, since the knowledge about the 
surface of human heads is not used by these algorithms, the estimation of head 
shapes from the limited information of a 2D image restricts the performance of 
these algorithms in practical applications, and therefore their use is unsuitable 
for face recognition. 

2.2 Image Representation Models 

Ideally, an image representation should be invariant to lighting changes. It has 
been theoretically shown that a representation which is invariant to lighting 
does not exist for unconstrained 3D objects [2]. However, for certain classes of 
objects this limitation does not necessary apply. Four image representations are 
explained below. 

1. Edge Maps: Intensity edges coincide with gray- level transitions. Gray- level 
transitions can be due to discontinuities in the surface colour or orientation. 
Such edges are expected to be insensitive to lighting changes. The advan- 
tage of using an edge representation is that it is a relatively compact rep- 
resentation. Such an edge representation is used by several face recognition 
systems [3]. 

2. Gabor-Like Filters: Physiological evidence indicates that at the early 
stages of the human visual system the images are processed by local, mul- 
tiple, and parallel channels that are sensitive to both spatial frequency and 
orientation. Several face recognition systems filter the gray-level image by a 
set of 2D Gabor-like functions before attempting to recognise the faces in the 
image [3,4]. Convolving the image with 2D Gabor-like filters is often similar 
to enhancing edge contours, as well as valleys and ridge contours from the 
image. 

3. Derivatives of Gray-Level: Derivatives of the gray-level distribution were 
used by several face recognition systems [3] to reduce the effects of changes in 
lighting conditions on face images. The derivatives used include directional 
and non-directional first- and second-order derivatives. It can been shown 
analytically that, under certain conditions, changes in ambient light will 
affect the gray-level image but not its derivatives. However, this is not the 



190 



Abbas Z. Kouzani and S. H. Ong 



case in the natural lighting conditions where the direction of the light source 
is also changed. 

4. Logarithmic Transformation: Logarithmic transformation is a non-linear 
transformation of the image intensities used in computer vision [5]. There 
is physiological evidence that logarithmic transformation approximates the 
response of cells in the retina of the human eye. 

Adini et al. [6] reported that for most image representations considered, the 
percentage of miss-recognition was above 50 percent. Therefore, the above listed 
image-representation methods can be used in a light in-effects determination sys- 
tem. 



2.3 Example-Based Models 

An example-based method handles image variations that are due to lighting 
differences by using, as a model, an explicit 3D model or, alternatively, a number 
of corresponding 2D face images taken under different lighting conditions. A 
number of 2D images can either be used as independent models or combined 
into a model-based recognition system such as those described in [7]. In the 
following, three examples of this method are given. 

1. Independent Image Comparison: The face model here consists of a large 
set of images of the same face containing all possible variations. The recogni- 
tion process involves the comparison of the distances between an input image 
and all the images comprising the model. A problem with this approach is 
that the number of images that the model must contain may be very large. 
Furthermore, this approach has limited generalisation capacity beyond the 
parameter values that are sampled and stored. 

2. Learning the Lighting Direction: Learning the input/output mapping 
from examples is a powerful problem-solving mechanism, once a large num- 
ber of examples is available. Brunelli [8] used one crude 3D head model 
to generate computer-generated masks for modulating the intensity of 2D 
front-view face images in order to produce images illuminated from differ- 
ent angles. The produced images are used for training an HyperBF network 
in which the lighting direction of the light source is associated with a vec- 
tor of measurements derived from a front- view face image. The images for 
which the lighting direction must be computed are very constrained - they 
are front- view faces with a fixed inter-ocular distance [9]. In addition, the 
calculation and compensation of the lighting direction are done based on a 
simple lighting model of the light source that does not represent a variety of 
complicated lighting conditions which exist in practical situations. 

3. Fisherfaces: This approach which is reported to perform better than the 
others, was proposed by Belhumeur et al. [10]. The idea is to produce classes 
in a low dimensional face image subspace obtained from linearly projecting 
a high-dimensional image space to the subspace. The multivariate discrimi- 
nant analysis [11] is used to select most discriminating features. In the most 



Wavelet Packets for Lighting- Effects Determination 



191 



discriminating feature space, the factors that are not related to classification 
are discarded or weighted down, and factors that are crucial to classification 
are emphasised. Belhumeur et al. have conducted experiments on fisher faces 
and three standard face recognition methods including the eigenfaces, and 
have reported lower error rates for the fisherfaces method. A drawback of 
this approach is that the transformation coefficients of different classes are 
very close to each other, compared to the other methods. That will cause 
false recognition [12]. 

2.4 Discussions 

Among the methods described above, the image representation models improve 
the accuracy of the recognition, but fail to offer a robust invariance to fight- 
ing changes [6]. The example-based methods such as Brunelli’s method [8] are 
promising and can produce better results than those of the image representation 
models. However, the performances of the existing example-based methods are 
still not satisfactory and there is plenty of room for improvement. 

3 Proposed System 

In the proposed system, the theories of the multivariate discriminant analysis 
and the wavelet packets transform are combined to form a learning system for 
determining the lighting-effects in the input face image. This combination is 
explained in the following. 



3.1 Multivariate Discriminant Analysis 

Multivariate discriminant analysis performs dimensionality reduction using lin- 
ear projection [11]. Each image is considered as a sample point in this high- 
dimensional space. A problem with this method is that if the within-class scatter 
matrix [11] is singular in the computation of fighting direction for face images, 
This stems from the facts that the number of images in the training set is much 
smaller than the number of pixels in each image. 

In order to overcome the complication of the singular within-class scatter 
matrix, the Principal Component Analysis (PC A) [13] is employed. The PC A 
builds a low-dimensional face space from a high-dimensional image space using 
example face images. The face space built by the PC A is an approximation of the 
real face space. But in order to have a reasonable approximation of the real face 
space, a large number of face images should be presented to the PC A method. 
If a large number of face images is not available, the PC A builds a face space 
that poorly approximate the real face space. 

We propose the utilisation of the wavelet packets projection for reducing 
the dimensionality of face images. In order to overcome the complication of the 
singular within-class scatter matrix, the training image set is first projected to 
a lower- dimensional space using the wavelet packets transform. This projection 



192 



Abbas Z. Kouzani and S. H. Ong 



reduces the size of the matrix; therefore, the matrix becomes square which will 
make it non-singular and invertible. Then, the discriminant analysis projection 
is performed in the space of the wavelet packets projection. 

3.2 Wavelet Packets 

The main difference between the wavelet packets transform and the wavelet 
transform is that, in the wavelet packets, the basic two-channel filter bank can be 
iterated either over the low-pass branch or the high-pass branch. This provides 
an arbitrary tree structure with each tree corresponding to a wavelet packets 
basis. The decision to split or merge is aimed at achieving minimum distortion. 

Best Basis Method: The wavelet packets transform offers a choice of optimal 
bases for the representation of a specific signal [14]. Therefore, it is possible 
to seek the best basis by a criterion. The chosen basis should carry substantial 
information about the signal. Since compression is the goal, the basis which min- 
imises the number of significantly non-zero coefficients in the resulting transform 
is chosen. Entropy is a suitable cost function for compression. 

3.3 Selection of Best Basis for Face-Image Class 

The wavelet packets transform and the best basis selection algorithm find the 
optimal basis for the representation of a specific signal such as face images. To 
select the basis for face images, 200 gray-scale front- view 64 x 64 face images are 
used as the training set. The training set is divided into four groups; each group 
consists of 50 face images. The following experiment is separately performed 
on each group of the face images. For each face image the stat-quadtree of 
entropy values is first created. For each group, 50 stat-quadtrees are obtained. 
Next, the entropy values of 50 stat-quadtrees are averaged. This generates four 
stat-quadtrees, one for each of the groups of the training set. Then, on each stat- 
quadtree, the best basis selection algorithm is performed to pick out the best 
basis from all the possible bases. The algorithm minimises the entropy values in 
the stat-quadtree. After obtaining the four best bases for the three groups of the 
training set, it is found that the four bases are the same. The maximum depth 
of splitting is chosen as 6 (explained in the following). 

3.4 Selection of Best Filter and Best Decomposition Level 

In the wavelet transform, the choice of filters is crucial not only for obtaining 
satisfactory reconstruction of the original signal, but also for determining the 
shape of the wavelet used for performing the analysis. 

To achieve the best compression of human face images, the best filter and 
the best decomposition level in the wavelet packets transform must be chosen. 
The best filter can be chosen by examining different filters and selecting the one 
with the highest information packing capability. An experiment is carried out to 



Wavelet Packets for Lighting- Effects Determination 



193 



select the best filter and the best decomposition level for the face- image class. 
Four groups of the training set of the face images are used in each experiment. 
Each group contains 50 gray-scale front- view 64 x 64 face images. 

Six types of orthonormal quadrature mirror filters (Haar, Beylkin, Coiflet, 
Daubechies, Symmlet, and Vaidyanathan) are examined. Each one of the six 
types of filters is used with a specific filter parameter. For instance, the Symmlet 
filter is used with various number of vanishing moments varying from 4 to 10. 
In addition, together with each particular type of filter and parameter, different 
levels of decomposition are used. The applicable range of the decomposition level 
in this experiment is 2-6. A total of 96 filter variants are constructed using all 
combinations of filter type, parameter, and decomposition level. 




Fig. 1 . Best filter and best decomposition level selection results for orthonormal 
quadrature mirror filters 



Each of the above 96 filter variants is applied to each of the four sub-training 
sets, and the best basis is searched and selected. Compression is then carried 
out on all the training face images. After compression, the reconstruction is per- 
formed on the compressed images. In the reconstruction stage, each face image 
is reconstructed from 1%, 7.5%, and 15% of the most important information of 
the transformed coefficients (the coefficients with the highest absolute values). 
The rest of the coefficients are set to zero before reconstruction. Therefore three 
images are reconstructed from each compressed image. The errors between the 
original image and the three reconstructed images are calculated and summed. 
This is done for all the 50 training face images. The average error is obtained 
and stored. Figure 1 displays the best filter and the best decomposition level 






194 



Abbas Z. Kouzani and S. H. Ong 



selection results for the orthonormal quadrature mirror filters. Each entry on 
the horizontal axis represents the measured error for a particular filter and a 
certain decomposition level. For instance, entry 79 denotes the measured error 
for the Coiflet filter with parameter 5 and the decomposition level 6. 

The results show that the Symmlet filter with 5 vanishing moments and 
the decomposition level 6 is the best choice for the face image database. The 
best basis, the best filter, and the best decomposition level selected for the 
face-images class are employed for reducing the dimensionality of face images. 
Although the best basis was obtained by using a training set with a limited 
number of face images, it is experimentally found that adding more face images 
to the training set does not significantly affect either the structure of the basis 
nor the compression ratio. A better reconstruction of a face image that is not in 
the training set is possible using the wavelet packets transform than using the 
PCA. 



3.5 Lighting-Effects Determination System (LEDS) 

The LEDS takes an input face image and classifies it into one of possible lighting- 
effects classes under examination. The LEDS learns to compute the lighting- 
effects using the multivariate discriminant analysis and the wavelet packets 
transform. Although the multivariate discriminant analysis has been used by 
Swets et al. [15] as the most discriminating feature, and later by Belhumeur et 
al. [10] as the fisherfaces, both the most discriminating feature and the fisherfaces 
were developed for the purpose of one-step recognition. In the LEDS, however, 
the utilisation of a combination of the multivariate discriminant analysis and 
the wavelet packets transform is proposed as an example-based scheme for de- 
termining the lighting effects, not for recognising faces. 

Algorithm 1 (Lighting- Effects Determination) The lighting -effects deter- 
mination process is performed in two stages as described in the following. 

Training: This stage involves the following operations which are performed only 
once. 

1. A training set of face images of different subjects is acquired. For each pos- 
sible lighting effect, one image is taken from each subject. 

2. The face images of the training set are grouped into different lighting- effects 
classes based on the lighting- effects that they contain. 

3. An image is manually selected from each class and is named the reference im- 
age of the class. Although this selection is an arbitrary choice, the employed 
principle is that the face should be located in the centre of the image. 

4. All images of each class are aligned based on the associated reference image 
using the pixel-based correspondence representation [12]. 

5. The multivariate discriminant analysis and wavelet packets transform is ap- 
plied to the training set to obtain a dimensionally reduced lighting space. 

6. The set of weights obtained from projecting each face image of the training 
set onto the lighting space is stored. 



Wavelet Packets for Lighting- Effects Determination 



195 



Determination : This stage involves the following operations to classify the input 
face image into one of the lighting- effects classes. 

1. The input face image is projected onto the lighting space and a set of weights 
is calculated. 

2. The weight pattern is classified into one of the lighting effects classes using 
the stored weight patterns of the face images in the training set. 



3.6 Training Set 

A collection of 63 3D head models are used to generate the training database 
using computer graphics techniques. The head models have been generated from 
stereo images obtained using the C3D system of the Turing Institute. The train- 
ing database contains 63 sets of 1331 2D full-face images of various poses within 
±45 rotations about X, Y, and Z directions and with the resolution of 9 (see 
Figure 2). 

To quantify the effects of varying lighting, 66 different lighting conditions are 
considered. For each of the 1331 poses, 66 full-face images are rendered under 
different lighting conditions. In each image, specific direction and distance of a 
single light source are implemented. The longitudinal and latitudinal of the light 
source direction are within 15 — 75 of the camera axis. First, the face images 
are grouped based on the pose of each face. Each group representing a specific 
pose contains 4158 face images of 63 people. Then, the face images within each 




Fig. 2. Face images rendered from a 3D head model of the Turing database 



196 



Abbas Z. Kouzani and S. H. Ong 



group are divided into 66 different classes. This classification is done based on 
the lighting direction of each face image. Therefore, each group that represents 
a specific pose will contain 66 classes of different lighting effects. It should be 
stated that both the most discriminating feature and the fisherfaces put the face 
images of one person taken under different lighting conditions into the same 
class for the purpose of recognition. However, since this aim of this work is to 
determine the lighting effects, only the face images with similar lighting effects 
are put into the same class in the face space proposed here. 

4 Experimental Results 

To evaluate the performance of the LEDS, the results of experiments performed 
on three different test sets, are presented and discussed below. The test sets used 
in these experiments are as follows. 

— Test Set 1 contains 411 face images of the Harvard face database. In each 
image in this database, a subject holds his head steady while being illumi- 
nated by a dominant light source. The space of the light source directions 
is then sampled in 15 increments. Figure 3 illustrates sample face images 
from Test Set 1. 

— Test Set 2 consists of 495 images constructed from the Yale face database. 
The 165 face images of the Yale face database are first copied into the test 
set. Then, 330 extra images are produced by rotating each image randomly 
within the range of 10 — 90 twice in the 2D plane. These images are added 
to the test set. Figure 4 illustrates sample face images from Test Set 2. 

— Test Set 3 is constructed by the author and contains 2710 face images. Face 
images of ten people were used to build this test set. A set of 271 lighting 
masks are superimposed on each face image to generate 271 images under 
different lighting conditions. In each mask, specific direction and distance of 
a single light source are implemented. Figure 5 illustrates sample face images 
from Test Set 3. 

Face images of Test Sets 1-3 are aligned using the pixel-based correspondence 
method [12], and are presented to two different systems. The first system uses the 
multivariate discriminant analysis and the PC A for classification of the lighting 
effects in the test face images. The PCA method has been trained on 200 front- 
view face images. The second system, that is the LEDS, uses the multivariate 
discriminant analysis and the wavelet packets transform for classification of the 
lighting effects in the test face images. The wavelet packets transform has also 
been trained on the 200 front- view face images. Table 1 summarises the results 
obtained from this experiment. 

As can be seen from the table, the LEDS achieves a higher correct classi- 
fication of the lighting effects for all three test sets than that of the method 
which uses the multivariate discriminant analysis and the PCA. It can be seen 
that the LEDS achieves a classification rate of 86.7% for Test Set 3 which is not 
as high as the rate obtained for Test Sets 1-2. The reason for this performance 



198 



Abbas Z. Kouzani and S. H. Ong 



is that the LEDS is trained on the face images containing real lighting effects, 
whereas the test images of Test Set 3 contains synthesised lighting effects in 
which the lighting masks are simply superimposed on face images taken under 
front-lit lighting. These lighting masks are simple approximations of the real 
lighting effects. Therefore, the images produced would only be an imitation of a 
corresponding real illuminated face image. However, training the LEDS on the 
example images containing synthesised lighting effects can improve the correct 
classification rate when the system is tested on this kind of face images. 



Table 1 . Classification of lighting effects for Test Sets 1-3 



Method 


Test Set 


Correct Classification 


Classification Rate 


Ideal System 


1 


411 


100% 


2 


495 


100% 


3 


2710 


100% 


Multivariate Discriminant 
Analysis + PC A 


1 


377 


91.7% 


2 


411 


83.0% 


3 


2043 


75.4% 


Proposed LEDS 


1 


396 


96.3% 


2 


458 


92.5% 


3 


2349 


86.7% 



5 Concluding Remarks 

A method has been proposed based on theories of multivariate discriminant 
analysis and wavelet packets transform to classify face images based on the 
lighting effects present in the image. An extensive set of face images of different 
poses, illuminated from different angles, are used in the training of the system. 
The performance of the system has been evaluated by conducting experiments 
on different test sets and by comparing its results against those of the existing 
counterparts. The system improves the performances of the existing counterparts 
because of the utilisation of the combination of the multivariate discriminant 
analysis and the wavelet packets transform for determination of the lighting 
effects, and the utilisation of training face images containing realistic lighting 
effects. The system may fail to determine an lighting effect in the input face 
image if the image contains a lighting effect that is not covered by the system 
or the image contains an extreme lighting effect. The performance of the system 
can be improved by increasing the number of face images in the lighting-effects 
classes, and by including more lighting-effects classes in the training sets. 





Wavelet Packets for Lighting- Effects Determination 



199 



References 

1. B. K. P. Horn and M. J. Brooks, Eds., Shape from Shading , MIT Press, Cambridge, 
Mass., 1989. 189 

2. Y. Moses and S. Ullman, “Limitation of non- model-based recognition schemes,” 
in Proc. European Conference on Computer Vision , G. Sandini, Ed., 1992, pp. 
820-828. 189 

3. R. Brunelli and T. Poggio, “Hyperbf networks for real object recognition,” in 
Proc. IJCAI , Sydney, Australia, 1991, pp. 1278-1284. 189 

4. J. Buhmann, M. Lades, and F. Eeckman, “Asilicon retina for face recognition,” 
Tech. Rep. 8996-CS, Institute of informatik, University of Bonn, 1993. 189 

5. D. Reisfeld and Y. Yeshurun, “Robust detection of facial features by generalised 
symmetry,” in Proc. International Conference on Pattern Recognition A, 1992, pp. 
117-120. 190 

6. Y. Adini, Y. Moses, and S. Ullman, “Face recognition: The problem of compensat- 
ing for changes in illumination direction,” IEEE Trans, on Pattern Analysis and 
Machine Intelligence , vol. 19, no. 7, pp. 721-732, July 1997. 190, 191 

7. P. Hallinan, “A low-dimensional representation of human faces for arbitrary light- 
ing conditions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition , 

1994, pp. 995-999. 190 

8. R. Brunelli, “Estimation of pose and illumination direction for face processing,” 
Tech. Rep. TR-AI 1499, Massachusetts Institute of Technology, November 1994. 

190, 191 

9. R. Brunelli and T. Poggio, “Face recognition: Features versus templates,” IEEE 
Transaction on Pattern Analysis and Machine Intelligence , vol. 15, no. 10, pp. 
1042-1052, 1993. 190 

10. P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. fisher- 

faces: Recognition using class specific linear projection,” IEEE Trans, on Pattern 
Analysis and Machine Intelligence , vol. 19, no. 7, pp. 711-720, July 1997. 190, 

194 

11. G. J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition , Wi- 
ley, New York, 1992. 190, 191 

12. A. Z. Kouzani, F. He, and K. Sammut, “Towards invariant face recognition,” 
International Journal of Information Science , vol. 123, no. 1-2, pp. 75-101, 2000. 

191, 194, 196 

13. A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis , John 
Wiley and Sons, 2001. 191 

14. R. R. Coifman and M. V. Wickerhauser, “Entropy-based algorithms for best basis 
selection,” IEEE Trans. Infor. Theory , vol. 38, no. 2, pp. 713-718, March 1992. 
192 

15. D. L. Swets and J. J. Weng, “Shoslif-o: Shoslif for object recognition and image 
retrieval (phase ii) ,” Tech. Rep. CPS-95-39, Michigan State University, October 

1995. 194 



Translation-Invariant Face Feature Estimation Using 
Discrete Wavelet Transform 



Kun Ma and Xiaoou Tang 



Department of Information Engineering, The Chinese University of Hong Kong 

Shatin, Hong Kong 
xtang@ie . cuhk . edu . hk 



Abstract. In this paper, we conduct a series of experiments to 
demonstrate the translation invariant property of a set of discrete 
wavelet features in a face graph. Using local-area power spectrum 
estimation based on discrete wavelet transform, we compute a feature 
vector that possesses both an efficient space-frequency structure and the 
translation invariant property. 



1 Introduction 

Wavelet transform has been widely studied in many aspects of image processing [2] 
[3] [4]. Especially, since discrete wavelet transform provides an efficient and non- 
redundant space-frequency representation of a signal or image, it has been widely 
studied in image compression and denoising research. However, for pattern 
recognition study, discrete wavelet transform has not been widely used. The basic 
requirement on a feature extraction method is translation invariance. That is, when a 
pattern is translated, its feature descriptors should also be translated, but not modified 
in its form. Such a property does not apply to the wavelet coefficients generated by 
fast discrete wavelet transform. The conflict between non-redundant structure and 
translation invariance is the main obstacle for wavelet application in pattern 
recognition. 

In this paper we use a local area spectrum computation to estimate wavelet features 
for face graph registration. The extracted features are shown to closely approximate 
translation invariance. Unlike traditional methods which solve the translation 
invariance problem by restoring the full-density representation [1] [4] [5], our method 
still uses the efficient computational structure of the discrete wavelet transform. 



2 Translation Invariance 

A system is time invariant if a time shift in the input signal results in an identical time 
shift in the output signal. If y(t ) is the output of a continuous-time system given 
x(t ) as the input, the system is time invariant if 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 200-210, 2001. 
Springer- Verlag Berlin Heidelberg 2001 




Translation-Invariant Face Feature Estimation Using Discrete Wavelet Transform 201 



x{t T ) y(t T), (1) 

where T is a time shift. Spatial shift invariance is the two-dimensional analog of time 
shift invariance. If the input image is shifted relative to its origin, the output image is 
shifted in the same way. The time shift invariance and spatial shift invariance are 
called translation invariance in general. In a pattern recognition system, the 
translation-invariance property is crucial for stable feature estimation. 



CWT by Biorthogonal wavelet-bior2.2 CWT b V Biorthogonal wavelet-bior2.2 




(a) CWT of original signal (b) CWT of shifted signal 

Fig. 1. Translation invariance of CWT. The signal in (b) is shifted 2 pixels to the right 



It is straightforward to see that both continuous wavelet transform and dyadic 
wavelet transform are translation invariant. Let f t ( t ) V f(t / ) be a translation of 

/ (t) # L 2 (R) by / . The continuous wavelet transform of f r ( t ) is 



W/XMVU/Xf /)2=5 *(— )dt 

* & \]S S 

v& 1 * v (u O 

V ! fit ")—j=^ ( —)dt ’ (t'Vt !) 

VW f(u !,s ) 



( 2 ) 



Since the output is shifted the same way as the input signal, the continuous wavelet 
transform is translation invariant, as illustrated in Figure 1. The dyadic wavelet 
transform of f r (t) is 



! ) —1=3 * 

& V2' 2 

v& 1 , t’ (u r ) 

vi f(0-r^( ,r V ’ •') 
• * V2 7 2 J 

VW f(u ! ,2 J ) 



( 3 ) 



This shows that the dyadic wavelet transform is also translation invariant. A dyadic 
wavelet transform example is shown in Figure 2. 



202 Kun Ma and Xiaoou Tang 



Dyadic wavelet transform by Biorthogonal-2.2 




(a) Orignal signal f(j) and its dyadic wavelet transform W f(u,2 j ) 



Dyadic wavelet transform by Biorthogonal-2.2 

1.5 I T T T r- 

1 - I 1 




(b) Shifted signal f(t 2) and its dyadic wavelet transform W/(w,2 J ) 
Fig. 2. Translation invariance of dyadic wavelet transform 



Translation-Invariant Face Feature Estimation Using Discrete Wavelet Transform 203 



However, the discrete orthogonal/biorthogonal wavelet transform is not translation 
invariant. Let f,{i)\/ f(t /) be a translation of /(f)# L 2 (R) by /. The 
orthogonal wavelets, 



V 2 7 & 2 3 



yield orthogonal wavelet transform coefficients, 

dj[n] VW/(2'n,2') v(/,^ y> ) V • 

Translating / (t) by / gives 

d ’,[«] VWf/(2 J n,2 J ) v(/,,5 y> ) 

°/& 1 * t 2 j n 

v ’U (f 

(2 f / V > (/’v? /) 

• & 2 J 

\/Wf(2 j n ! ,2 J ) 



(4) 



(5) 



( 6 ) 



From Eq.(5) & (6), only when / V^32 y , ## A , W/,(2 y '«,2 y ') VW/(2 j (« #),2 ; ) , 
i.e. £]. This means if the translation is the multiple of 2 7 , the 

orthogonal wavelet coefficients of /‘ (^) is the translation of the coefficients of / ( t ) ; 

otherwise, these coefficients may be very different. Therefore, discrete orthogonal 
wavelet transform is not translation invariant. For biorthogonal wavelets, a dual 
wavelet function is used for reconstruction. The above conclusion still holds. 

There are apparent differences between the dyadic wavelet transform and the 
discrete wavelet transform. The dyadic wavelet transform is a translation invariant 
representation because it does not sample the translation factor. But this creates a 
highly redundant signal representation. On the other hand, the discrete wavelet 
transform samples the time and scale in a dyadic grid, which is implemented by 
subband filters with downsampling operation. It has a very efficient computation 
scheme and a compact data structure. However, in such a multi-rate system, the 
translation of the input signal do not produce a simple translation of the output, unless 
the translation is a multiple of the corresponding downsampling factors. 

We use a simple experiment on a rectangle signal f\n\ to illustrate this problem. 
The discrete orthogonal wavelet transforms of both the original signal and a shifted 
signal are shown in Figure 3. Comparing the two transforms we see that the wavelet 
coefficients d^n] in the first layer (downsampling factor is 2) shift 1 unit without 
changes in value. However, in the other layers (downsampling factor is 4, 8, 16,£ ) 
the produced wavelet coefficients dj[n\ ( j V2,...,5 ) change quite significantly. 

This conflict between computational efficiency and translation invariance has greatly 
hindered applications of wavelet transform in patter recognition. 




204 Kun Ma and Xiaoou Tang 



Discrete wavelet transform by Daubechies wavelet db2 




d4[n] - 



d5[n] - 



a5[n] . 



(a) Original signal f[n] and its DWT representation 

1 
0 
-0 

d1[n] 



Discrete wavelet transform by Daubechies wavelet db2 



- 














[ 


r 


r 


, 






50 | 


100 


150 


200 


250 



d2[n] 

d3[n] 

d4[n] 

d5[n] 







a5[n] - 



(b) Shifted signal f\n 2] and its DWT representation 
Fig. 3. Discrete wavelet transform and translation variance 



Translation-Invariant Face Feature Estimation Using Discrete Wavelet Transform 205 



3 Fast Translation Invariant Feature Extraction 

Several approaches have been proposed to solve the translation invariance problem by 
restoring the full-density representation [1] [4] [5]. The methods either try to reduce 
computation complexity by introducing more storage complexity or try to reduce 
storage complexity by introducing more computation complexity. The non-redundant 
structure and translation invariant property seem incompatible when using fast DWT. 

For pattern recognition, the extracted features do not need to give a precise 
representation of the original image. The only requirement is that they can distinguish 
different patterns. So instead of using the DWT coefficients as image features 
directly, we estimate the local energy distribution in each subband as the feature 
values. 

Let G{( ) be the power spectrum density of a wide sense stationary signal f ( t ) , 
the time-scale based spectral estimator can be written as the time marginal of the 
scalogram (squared modulus of the wavelet transform) [1], 

G(0)V-E( |) f{t),3 jn *| 2 > (7) 

j n 

where N - is the number of wavelet coefficients in scale j. It has been proven that the 

dyadic sampling grid both in time and frequency do not deteriorate the estimation 
performance [1]. We can use this property to estimate the local area power spectrum. 
For a small local area, the spectrum content of the image should remain relatively 
constant with respect to translation. We now look into this property in a face image 
matching study. 

Let A be a small window around a fiducial point p V (x p , y p ) in the face image, 

^(p) V{I(x,j)|||(x,jO (x,,^)||) +}, (8) 

where + defines the size of the neighborhood. In the DWT domain, the wavelet 
coefficients corresponding to the window A are distributed in all subbands and form a 
space-frequency tree. Let R k be the window of the set of related coefficients in the k- 
th subband, 



R k (p)\/Wf k (u,v) 



u ’ v ) ( x p ,y p )/2 l ) 



( 9 ) 



where Wf k (u,v ) represents the wavelet coefficients in the k - th subband. However, 

such a space-frequency tree is not suitable for fiducial points matching since the 
wavelet coefficients are not shift invariant due to the down sampling at each level. To 
alleviate the problem, we use the local square sum of wavelet coefficients within the 

small window R k at each level to estimate the local area power spectrum around the 
fiducial point, 




206 Kun Ma and Xiaoou Tang 



• (*,)*( \W t (u,v)f- ( 10 ) 

R k 

We then describe a fiducial point p by the vector, 

j/ j( P ) v[. ( r ,),. (*jr, (ii) 

where K is the number of subbands covered by the space-frequency tree. So each 
element in the vector approximates the energy of a small area of the original image at 
a particular location and in a particular frequency band. The whole vector can be seen 
as a power spectrum estimation around each fiducial point in a face image. Given 
two vectors J and J ’ in two face images, their similarity function can be defined as 
the normalized correlation: 



S( J,J) 




( 12 ) 



where . and . ’ is the element of J and J ’ vector. This similarity function gives a 
measure of whether two fiducial points are similar. The function has a value close to 
one when the two fiducial points match each other closely. Such a measure is 
important in face detection study. 



4 Translation Invariant Analysis Experiments 

In this section, we design a set of experiments to investigate the translation invariant 
property of the local-area power-spectrum estimation method. Given a face image I 0 
and its space-frequency tree R 0 that centered at a point p, the local power spectrum 
vector is J 0 . We shift the original face to Is with displacement s, and extract new 
power spectrum vector Js corresponding to the space-frequency tree R s of the shifted 
point p. To investigate the translation invariance of the power spectrum vector, we 
measure the similarity function between the two vectors. If the similarity function 
value is close to one, it shows that the two vectors are similar to each other thus the 
vector is translation invariant. 

A face image I 0 of size 256x256 is shown in Figure 4(a), where the face portion 
occupies an area of 128x128. A six level DWT of the face image is displayed in 
Figure 4 (b). Figure 4 (d) shows the fiducial space-frequency tree for a 64x64 window 
centered around point p(64, 64). The DWT spectrum vector J 0 is computed from the 
space-frequency tree. Figure 4 (c) is the face area reconstructed from the space- 
frequency tree in (d). We now shift the face image to Is with displacement s(7,15), 
and compute a new spectrum vector Js from the shifted space frequency tree shown in 
Figure 5. Note that zero expanding is used to remove the boundary effect. 

If we replace the space-frequency tree of Is with the space-frequency tree in I 0 , 
then reconstruct a new face image, as shown in Figure 6, we can see that the 
reconstructed fiducial area changes dramatically comparing with the original face. 




Translation-Invariant Face Feature Estimation Using Discrete Wavelet Transform 207 



This shows that the shifted space-frequency tree is very different from the original 
one. So space-frequency tree is not shift invariant. However, the similarity value of 
the two local spectrum vectors is close to one. This shows that the local spectrum 
vector does not change much with the shift. To further verify the shift invariant 
property of the spectrum vector, we shift the face image over every point within an 
area of 64x64. Then the vector similarity values are computed for all the shift 
locations. The statistical results are shown in Figure 7 and Table 1. We can see that 
the similarity values are very close to one, with an average of 0.97. Thus local 
spectrum vector closely approximates shift invariant. 



5 Conclusion 

In this paper, we demonstrated the shift invariant property of local power spectrum 
vector of discrete wavelet transform using a set of experiments. Such a property is 
crucial for pattern recognition applications. It solves the basic conflict between 
efficient space-frequency representation and shift invariance. We are currently 
studying the application of this feature vector in face detection and face recognition 
research. 



Acknowledgments 

We thank the Computer Vision Center of Purdue University for the face image 
database. The work described in this paper was fully supported by an AOE in 
Information Technology grant and a RGC grant (Project no. CUHK 4 190/0 IE) from 
the Research Grants Council of the Hong Kong Special Administrative Region. 



References 

1. Antoniadis and G. Oppenheim, Wavelet and statistics , Springer- Verlag, 1995. 

2. K. Chui, An Introduction to Wavelets , Academic Press, Boston, 1992. 

3. Daubechies, Ten Lectures on Wavelets , SIAM Publ., Philadelphia, 1992. 

4. S. Mallat, A Wavelet Tour of Signal Processing , 2nd Ed., Academic Press, 1999. 

5. E. P. Simoncelli, W.T. Freeman, E.H. Adelson, and D.J. Heeger, "Shiftable 
multiscale transforms", IEEE Trans, on Information Theory , Vol. 38, No. 2, pp. 
587-607, Mar. 1992. 




Discrete Wavelet Transform 209 





(c) Reconstruction from (d) 



(d) A shifted space-frequency tree 



Fig. 5. Shifted image and its space-frequency tree 




Fig. 6. Reconstruction from space-frequency tree of Fig. 4(d) but at the position in Fig. 5(d) 








210 



Kun Ma and Xiaoou Tang 



Table 1. Statistics of spectrum vector similarity values 





Max 


Min 


Mean 


S.T.D. 




1 


0.886 


0.9705 


0.0051 



Jet Similarity Distribution 




(a) Probability distribution of the spectrum vector similarity values 
Graph Similarity Surface 




(b) Spectrum vector similarity values for all shifted locations 
Fig. 7. Translation invariance verification 



Text Extraction Based on Nonlinear Frame 



Yujing Guan 1 and Lixin Zhang 2 

1 Jilin University Information Technologies Co. Ltd 
Qianjin Rd. 95, Changchun, 130012, P. R. China 
yjguan@hotmail . com 

2 Mathematics Department, Jilin University 
Changchun, 130012, P. R. China 
zhang_lixin@163 . com 



Abstract. Locating and extracting text in image or video has been 
studied in recent decade. There is no method robust for all kinds of 
text, it may be necessary to apply different methods to extract different 
kinds of text and fuse these results temporarily. So finding new method 
is important. In this paper, we combine order statistic and frame theory 
and give a new method, it can extract text of various colors and size 
once, the experimental result is satisfying. 



1 Introduction 

In this new era of information explosion, especially because of the development 
of Multimedia and Internet, a lot of information present themselves as image 
or video. Problems about how to obtain the information one wants from them 
become more and more important. Among them, locating and extracting text 
in image is a very useful and challenging work. The text embedded in image 
or video usually provide information about the names of people, organization, 
or about location, subject, date, time and scores, etc. Those texts are powerful 
resources for indexing, annotation and content-oriented video processing. So a 
lot of people get to work with this problem in recent decade, many methods 
are proposed [1,2,3,4,5,10,11,12]. But it seems that each method has its limi- 
tation. For example, current optical character recognition(OCR) technology is 
restricted to finding text printed against clean backgrounds, and can not handle 
text printed against shaded or textured backgrounds or embedded in images. 
Even as S. Antani said in [1], none of the proposed text detection and localiza- 
tion methods was robust for detecting all kinds of text, it might be necessary to 
apply different methods to extract different kinds of text and fuse these results 
temporarily. This may be induced by the essential complexity of the problem 
but make it important to provide more methods for people to select according 
to the problem they face. 

In [6] and [7] Dr. Ma and Dr. Tang apply order statistic to detecting step- 
structure and page segmentation, they get a good result. But their method is 
only used for binary image and can’t be used for gray image. In this paper we 
combine order statistic and frame theory and apply them to extracting text in 

Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 211 216, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



212 Yujing Guan and Lixin Zhang 



complex background and the result is satisfying. As we know, they have not been 
used in this field up to now. 

The proposed method first partition the gray image into a number of small 
adaptive blocks, for example of 16 x 16 size, and proceed to find text in each block. 
The value of each pixel in the block is supposed to be a sample observation of a 
random variable. Sort the samples in ascending order and get an order statistic. 
According to the text characters represented in the order statistic, if there is 
text in the block, there will be steps in the values of the order statistic. Frame is 
used to detect the steps. Of course, it is possible that there is step in the order 
statistic while there is no text, so at last it is necessary to test whether the step 
is formed by text by applying text characters. 

2 Order Statistic and Text Characters 

Definition 1 . Let (Xi,X 2 , • • • ,X n ) be a sample. Order statistic is the statistic 
obtained by replacing (Xi,X 2 , • • • , X n ) in ascending order. They are denoted by 
(X(1),X(2), • • ■ ,X( n) ), which satisfy 

X(i) < X(2) < • • • < X( n ). 

In fact, we first partition the gray image into several blocks with suitable 
area, suppose there are n pixels in a block and sort them in ascending order, 
then we get order statistics. If there is text in this block, the order statistics 
would have the following characters: 

1. The values in text comprise a subsequence of , X( 2 ), • • • , £( n )), which we 
denote (x(b), x (b+i)? ' ' ' ? x (e))? and its mean is distinct from those of its left 
subsequence and right subsequence. 

2. There are k and K such that k < e — b < K; 

3. There exists a positive constant € > 0 such that — x^) < £; 

4. There exists a positive constant gt such that the variance of the text sub- 
sequence is smaller than a\. 

5. All pixels in text form one or more curves. 

3 Frame Transform 

As we have declared in section 2 , the gray values of the text form a subsequence, 
(#(&) , x (b+i ) , • • • , x (e) ) 7 in the order statistic (^(i) , x^ 2 ) > • • • , #( n ) ) of the whole gray 
values in a block. In ideal case, The values in this subsequence are almost equal, 
but the neighboring values on the left or the right of it are significantly smaller 
or larger. In other words, there exist singularity points at the two endpoints of 
the subsequence. If we can find the correct singularity points, we are able to 
continue to separate the text from the background graphics. Wavelet has been 
successfully and frequently applied to singularity detection, but it is not adaptive 
here. 




Text Extraction Based on Nonlinear Frame 



213 



Definition 2 The sequence {<f n }n a is called a frame of a Hilbert space H, if 
there exist two constants A > 0 and B > 0 such that for any f E H, 

A ||/|| 2 <52\<f,t n >l 2 <B ||/|| 2 . 

n A 

When A = B the frame is said tight. 



Example 1: Let TV = 2n + 1, be a positive odd number, 






^,o <t< 



N_ 

2 ’ 



< 




f<t<N, 

otherwise. 



and - n), then {^ n (t)}j, n z is a frame. 

Let f^ n =< /, >, then the frame coefficient f^ indicates the difference 

of two means of / at the left side and the right side of somepoint. In the following 
of this paper, call it Haar-N frame. It is obvious that this kind of frame vanish for 
constant. Similar to wavelet, a larger absolute value of frame coefficient indicates 
a larger step. But wavelet coefficient indicates a sharp change of the value of the 
function at a point and Haar-N frame coefficient indicates a step of means of 
the function at the left side and right side of a point. So, the Haar-N frame is 
not sensitive to noise while high frequency wavelet coefficient is very sensitive to 
noise. This character of Haar-N frame is very adaptive to detect change point in 
order statistic. To simplify discussion, denote the gray value of text, background, 
text noise and background noise by X,Y,W1,W2 respectively, suppose W 1 and 
W 2 are zero mean, the sample of X + W1 is less than the sample of Y + W 2. 
Because we do not know any other statistic property of the above statistic than 
E(X+W 1) is less than E(Y -\-W2), so we want to detect E(Y +W2) — E(X+W1) 
to make sure where the samples of text order statistic are and where the samples 
of background order statistic are. 

The existence of noise usually makes the difference of adjoining points in 
the order statistic decrease or vanish, even makes it smooth, thus it is diffi- 
cult to detect step with wavelet because wavelet coefficient is generally a lin- 
ear combination of difference of adjoining points in the order statistic. For 
example, if we use haar wavelet, the wavelet coefficient at the step point is 
min(F + W2) — max(X -\-W 1), obviously it is very sensitive to noise and differ- 
ent from E(Y) — E{X). But frame is not sensitive to noise, 

From the propost ion [8, pp.139], we know the variance of noise becomes 
times. But for our order statistic, we can not get such a good result, because we 
do not know where the samples of text are and where the samples of background 
are. On the other hand, though E(X + W1) and E(Y + W2) are unobtainable, we 
can calculate the mean of some samples with greater values for the text and that 
of some samples with smaller values for the background graphics respectively. 
Lastly we calculate the difference of these two means and base our detection on 
it instead of E(Y + W2) — E(X + W1). Sometimes, there is error, but it is better 
than wavelet. 



214 Yujing Guan and Lixin Zhang 



In fact, theoretically we have 

E(Y + W2) - E(X + Wl) 

f 52 i=o,N®i Yj + W2j _ 52i=o,N®i Xj + W\j 

\ N N 

So the previous difference of two means is just an approximate estimate of E(Y -\- 
W2) — E(X + W1) using finite samples, and at the same time it is also the Haar-N 
frame transform. Since the number of text sample is unknown, we must choose 
an adaptive N. 

A large absolute value of the frame coefficient indicates a step, but a step 
will make one or more coefficients’ absolute values large. Naturally, we should 
choose the coefficient with greatest absolute value to make sure where the step 
happens. In wavelet theory, these points are called Maxima points, more details 
see [9] and [8, ch.6], in this paper we also use this concept. Moreover, we should 
also notice that there may be large step in the order statistic induced by the 
complexity of background graphics. Thus we need a threshold r, to indicate 
whether a step is large enough, since our supposition that the gray values of text 
are distinct from those of background graphics has guaranteed that the step at 
the correct change point should not be trival. By the way, we should notice that 
our frame coefficients for the order statistic are all nonnegtive in this paper. More 
precisely, after we get the frame coefficients, we proceed to let those smaller than 
r be neglected, and we only test whether those points corresponding to the left 
frame coefficients are from text using the text characters presented in section 2. 

4 Algorithms 

After partitioning the gray image into a number of continuity regions with suit- 
able area, such as into squares with m x n pixels, we replace the values in one 
block in ascending order and get the order statistic (^(i), #( 2 ), • ■ • , £( mn )). We 
use the stationarity of the gray values of the text and their distinction from the 
other values from the background graphics to reduce the separation of text to 
step points detection. Wavelet is not adaptive here, and we use Haar-N frame. 
Those step points separate the order statistic into several subsequences, and we 
will use the text characters to decide which one or several are from text. Here 
N > 0 is an integer, and let r be a threshold, if a frame coefficient fi satisfy 
\fi\ < r then fi must not be a Maxima point formed by any step. Suppose the 
image has M blocks whose size is m x n, e is the difference of the maximum 
and minimum of text sample, o\ is the maximum variance of text, k and K are 
minimum and maximum number of text point in one block if there is text in 
this block. We give our algorithm as follows: 

Algorithm : For every block do 

1. Get order statistic: Get the samples from current block, and sort them in 
ascending order to get the order statistic, X( 0 ), • • • , X( mn01 ). 



Text Extraction Based on Nonlinear Frame 



215 



2. Calculate frame coefficients: for i=0,l,- • -,mn-l, calculate 

A+f< 8)1 \ 

E x m~ E x u)\- 

V V j=i j=i® f / 

Here when j < 0 , X (i) = X (0) , when j > mn - 1, X (i) = X (mn(gll ). 

3. Find Maxima points: 

(a) For i=0,l . • • -,mn-l, if /* < r, set /* = 0. 

(b) Find Maxima points, suppose the number of the Maxima points is a, sort 
the Maxima points in ascending order and denote them as oq , c* 2 , • • • , oc a \ 
So there are a + 1 subsequences of the order statistic as follows: 

{^-"(CKo) ^(Q!101) } 5 {^(ai) 5 ' ' ’ 5 -^(Q!2 01) }? ’ * * 1 {^(<A a ) 5 ‘ ‘ 5 ^(Q! a + l) } 5 

where we let = 0, a a +i = run — 1. 

4. For every subsequence of the order statistic, i = 0, 1, • • • , a, decide whether 
it is text according to the text charecters in setion 2. 



Distribution 
is key 

People mi only m-cd simple uccms to reports, thev 
often need different data. Information that's important 
to one person mav lx - a waste of another person's time. 
Or some information may need to be available iu one 
group and not m another The challenge is in nintru] 
t he flow of information - to yrmr company 's internal 
and external audiences - so dial it's smooth* efficient. 



Fig. 1. A scanned image 



Distribution 
is key 

People not only need simple a u ess to reports, iticy 
often need different dara. Information dial’s important 
to one person may be a waste of another person s lime, 
Or some infannaiinn may need id be available in one 
group and not to another TTir challenge is to control 
the flow of infer naikm - to your company's internal 
and external audit: i icei - so that ilA MJnxoh, efficient, 



and the extracted text image 



5 Examples 

We applied our new method to some pictures and got a satisfying result. Fig.l 
left was a scanned image, the background is a purple flower, Fig.l right is the 
result of the extracted text image. The text of Fig. 2 left was added by computer, 
3 color text, black, blue and red were added, and the image was conversed to 
gray image, Fig. 2 right is the extracted text image from the background. 





216 Yujing Guan and Lixin Zhang 




AECC'EFGHDXIJ^OPgEmJVWXTZ 

abt de^gfujklmDO p ij.s hjvurayx 

ABCDEPGHI JKLMNOPQR3TUVW XYZ 
abcde Ighijklm n OpCJTStUVWXyZ 



AECDEFGHI J KLMNOPQRSTUVWX YZ 
abedefghijldttmo^istuvwxyz 



ABCDEFGHiJKIMHOPQRSTUVWXYZ 



AEaJEFGfflJXLMNOPQKmrv^STZ 



Fig. 2. ext image added by computer and the extracted text image 

References 

1. S. Antani, D. Crandall, A. Narasimhamurthy, V. Y. Mariano, R.Kasturi, Evalua- 
tion of Methods for Detection and Location of Text in Video, In Proc. 4th IAPR 
International workshop on document analysis systems - DAS ’2000, Rio Othon 
Palace Hotel - Rio de Janevio, 10-13 December 2000. 211 

2. A. Antonacopoulos and D. Karatzas, An Anthropocentric Approach to Text Ex- 
traction from WWW Images, In Proc. 4th IAPR International workshop on docu- 
ment analysis systems - DAS ’2000, Rio Othon Palace Hotel - Rio de Janevio, pp 
515-525, 10-13 December 2000. 211 

3. U. Gargi, S. Antani, R. Kastui, Indexing Text Events in Digital Video Databeses, 
In Proc. International conference on pattern Recognition, Vol. 1, pages 916-918, 
Aug. 1998. 211 

4. Yassin M. Y. Hasan and Lian J.Karam, Morphological Text Extraction from Im- 
ages, IEEE Transaction on Image Processing, Vol. 9, No. 11, pp 1978-1983, Nov. 
2000. 211 

5. Huiping Li, David Doermann and Omid Kia, Automatic Text and Tracking in 
Digital Video, IEEE Transaction on Images processing, Vol. 9, No. 1, pp 147-156, 
Jan. 2000. 211 

6. Hong Ma, Yong Yu, Li Ma, M. Umeda, Detection of Step-Structure Edge Base on 
Order Statistic Filter, preprint. 211 

7. Hong Ma, Zhou Jie, Yuanyang Tang, Nonlinear Stochastic Filtering Methods of 
Adaptive Page Segmentation, preprint. 211 

8. Stephne Mallat, A Wavelet Tour of Signal Processing, Academic Press, San Diego, 
1998. 213, 214 

9. Stephne Mallat and W. L. Hwang, Singularity detection and processing with 
wavelets. IEEE trans. on info, theory, (38):617-643, March, 1992. 214 

10. Anil K. Jain and Bin Yu, Automatic Text Location in Images and Video Frames, 
Pattern Recognition, Vol. 31, No. 12, pp 2055-2076, 1998. 211 

11. Victor Wu, Raghvan Manmatha, and Edward M. Riseman, TextFinder: An Auto- 
matic System to Detect and Recognize Text in Images, IEEE Transaction on Patter 
Analysis and Machine Intelligence, Vol. 21, No. 11, pp 1224-1229, Nov. 1999. 211 

12. Yu Zhong, Hongjiang Zhang, and Anil K. Jain, Automatic Caption Localization in 
Compressed Video, IEEE Transaction on Patter Analysis and Machine Intelligence, 
Vol. 22, No. 4, pp 385-392, Apr. 2000. 211 




A Wavelet Multiresolution Edge Analysis Method for 
Recovery of Depth from Defocused Images 



Wang Qiang l , Hu Weiping l , Hu Jianping 2 , and Hu Kai 2 

1 Dept, of Physics and Electronic Science Guangxi Normal University, 
Guilin, Guangxi, 541004 
2 Dept. Of Computer Science and Engineering, 

Beijing University of Aeronautics and Astronauts, Beijing 100083 
qwang@public . glptt .gx.cn 



Abstract. A approach of depth recovery from defocused image based 
on wavelet multiresolution analysis is proposed. The Lipschitz 
exponent is used to describe the singularity of the edge of an object in 
image. A curve of relationship between Lipschitz exponent and the 
distance from interested object to camera is obtained. Experiment 
proved the effective of the method. 



1 Introduction 

With the exploitation of industrial automation, It is more and more difficult for human 
vision to perform the product test task in the large scale producing line. Computer 
vision is becoming the key technique used to promote the producing efficiency and 
ensure the products qualities. It is more and more widely adopted. For example, the 
computer vision system can be used in automatic testing of the mechanical 
component production and also used in the monitoring or controlling of the conditions 
in the large scale producing line. For ordinary purpose, a two-dimension grayscale 
image system can serve the purpose of product testing and monitoring. There are also 
many application conditions that need three-dimension system to test the objects or 
products that are interested. In such an application the three-dimension construct must 
be dealt with and the three-dimension depth measure must be performed. 

Human eyes are very effective depth measuring system. The depth of interested 
object surface is obtained and set up by the combination of the two planar images 
obtained by the different viewpoints of the eyes. Because such kind solid viewing is 
under the natural light it is belong to the passive measuring[l]. Though the human 
two eyes viewing is very effective it is very difficult for the computer system to 
simulate such a solid vision. The first thing to be solved is that the corresponding 
points between the two planar images must be found. This will cost a lot of time 
consuming and complex computing work. In addition there may not be the sufficient 
information which can be used to set up the one to one corresponding relationship at 
the interested points. Then can we bypass the computing and finding of the 
corresponding points in a solid scene and get the depth information by analyzing and 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 217-222, 2001. 
Springer- Verlag Berlin Heidelberg 2001 




218 Wang Qiang et al. 



computing a single grayscale image? We will answer the question and propose an 
approach of depth recovery from defocused image based on wavelet multi resolution 
analysis in this paper. Experiment proved the effective of the method. 

The edge of object in a scene is very important. It carries a lot of useful 
information about the object. Usually, an image is processed only by edge finding and 
binary coding [3]. A lot of useful information is wasted. It has advantages to use the 
wavelet multi resolution to analyze the singularity of an object and use some 
parameters to describe the different characters of the objectsocedge in scene [2] [5] [6]. 
This will be very helpful to understand the content in a scene. In this paper we try to 
compute the Lipschitz exponent of an objectsoo image by the wavelet transform 
multiresolution analysis and use the Lipschitz exponent as a criterion to judge the 
objectsoodefocused degrees. Then we can get the relationship between the Lipschitz 
exponent of the interested object and the distance from it to camera in curve line. 

This paper is organized as following: The principle of optical imaging of an object 
in a scene is introduced first. Then the wavelet edge multiresolution analysis and the 
Lipschitz exponent computing method is described. Some of the experimental result 
are given in the following part . finally we give the estimating of advantages of the 
proposed approach. 



2 Object® Defocused Imaging 



The front-terminal of the computer vision system is sensor part .To satisfy the real- 
time testing request in the industrial automation system and to reduce the data 
processing a camera with 256 degree grayscales is usually selected as a sensor. The 
optical system construction of the sensor can be abstracted as figure 1 [4]. 







V- n. 1 


Ffi 

7 


1 ,E Kp; i 


Image 


Manet CD ' ^ 


GptitaFATtis 

/■-' 



Lens 



€ 

I mag 



Fo 


R 




;;;7 ; 7 




1 




Opilt s.il Aais 


1 p 


PlaneCCD s \y 


L 

Lens 





Fig. 1. Focused image of a point 



Fig. 2. Defocused image of a point 



In figure 1, suppose P is a single point at the surface of the interested object. 
Consider a luminous point P in a 3D scene located on the optical axis of the camera 
lens as shown in figure 1 . Light emits from point P in all directions. The divergent 
bundle of rays passes through the lens and converges to a point again on the optical 
axis. If the convergent point lies exactly on the CCD image plane it forms a sharp 
point (as shown in figure 1). Under such a condition if the point was replaced by a 
real object the clear image of the object forms on the CCD image plane. If we change 
the position of point P by increasing the distance between P and the camera lens the 
image of P point no longer converges on the CCD image plane. This is shown on 
figure 2. At such condition the point P forms blur circle area instead of a clear focused 
image point. The radius of the blur circle area is related to the defocused extent 



A Wavelet Multiresolution Edge Analysis Method for Recovery of Depth 219 



(denoted by roe If we replace the P point with a real object at this condition then we 
obtained a blur image on the CCD plane. 

In the practical application system, when the camera is selected the distance 
between lens and the CCD plane remains unchanged. If the focal length is fixed then 
the distance which enable the interested object in scene imaging clearly is fixed. If we 
change the distance between the object and the camera along the optical axis the blur 
degree of the object image is also changed. The further the object departs the P point 
the severer the defocused phenomena becomes (Of course there are two conditions of 
defocused R>Ro and R<Ro, But if we adjust the focal length of the camera to 
minimum we can only consider the condition of R>Ro) 

Through above analysis we can derive that if we denote rooas the radius of 
defocused blur area and Ro denote the focal length and R denote the distance between 
the defocused object and the camera then we have following relation. 

r=K(l/Ro-l/R) (1) 

Here K is a coefficient. It relates the distance between the lens and the CCD plane 
of the camera. It also relates the focal length and the aperture of the camera. When all 
of these parameters are decided the K remains unchanged. Then from the formula (1) 
we can derive that the radius of defocused circle area will increase with the distance 
increasing. 

The formula (1) can be written as following form: 

R=Ro*K/(K-r*Ro) (2) 

This formula express that if the Ro and K are given the distance (denoted by R) 
between the interested object and the camera lens can be obtain by measuring the 
defocused radius rooThis principle has been used by a few researchers [4], [7]. In this 
paper we will introduce an algorithm based on the wavelet multi resolution analysis to 
implement the principle and introduce the Lipschitz exponent to describe the 
regularity of the object edge in scene so that to determine the distance from camera to 
the interested object. 

If the objects in scene have some evident feature of texture the defocused extent of 
the texture can also be used to measure the depth of the object [4] In this paper the 
general condition is considered so we only consider the defocused extent of the edge 
and contour. 



3 Wavelet Multiresolution Analysis and Lipschitz Exponent 

From above analysis we can make a conclusion that the distance between the camera 
and the interested point can be measured by computing the radius of defocused area. 
The contour of the real object is consisted of countless points so if the interested point 
is replaced by the real object and all of the points are in the defocused position we 
will obtain a blur contour. Actually we can measure the depth of an object by 
analyzing the grayscale gradient (say singularity) of its edge. To do this, an effective 
way is wavelet multi resolution edge analysis. 




220 Wang Qiang et al. 



Wavelet transform 

The basic theory of the 2D image Wavelet transform has been stated in many books 
and referent materials [5], [6]. We donoto want to describe the theory in detail. For 
thorough presentation of the wavelet transform, refer to the mathematical books of 
Meyer [5], [6]. 

In this paper the Mallat fast algorithm for 2D wavelet transform is used as following: 

j = 0 

while (j<J) 

wK ,/! s d ,/#G ,;D) 

2 J 1 2 J 

w 2 f ] f\s d JS(D,Gj) 

14 ^ 

j! j 1 

end of while 

The G,D,H are coefficient of filter. The G is the coefficient for high frequency 
band, while H is the coefficient for low frequency band and D is Dirac function. To 
know the value of these coefficients and the method to convolute them with the 2D 
image signals, refer to the reference [5]. 

If we use w l f(x,y) and w 2 f(x,y) to denote horizontal and vertical ID 

2 J 2 J 

wavelet transform respectively. The 2D wavelet transform modulus and direction of 
the gradient are given respectively by 

M 2 ‘f(x,y)\ w\jf(x,y) | 2 | 4444) 1 2 (3) 



A-,, f(x,y ) ! arctan(- 



w 2 2 jf(x,y)\ 



) 



(4) 



Lipschitz exponent computing 

We perform the dyadic wavelet transform in the image in which the interested object 
is included. We remain the pixels with the wavelet transform modulus above a given 
threshold. Then we can obtain the contour of the object. Finding the modulus 
maximum along the gradient direction of transform modulus and linking these pixels 
can set up a link of transform modulus maximum. We calculate the average value of 
the transform modulus in maximum link for each scale and denote it by Mj. We do the 
same from the evolution across dyadic scales (generally, 3 or 4 scales is enough) 

It can be proved that the 2D wavelet transform satisfies the following inequality: 

here s 0 ! ^2^ 



I w 2 jf(x,y) \# k2 j s 0 



(5) 




A Wavelet Multiresolution Edge Analysis Method for Recovery of Depth 221 



We computer the three parameters K,a,and aso that the inequality of (6) is as close 
as possible to an equality for each dyadic scale. That is to minimize the following: 

\ (log 2 |MJVlog 2 (£)VJV^log 2 (% 2 2 2 ')) 2 (6) 

j\\ 1 

The input parameters of (7) are average modulus maxima Mj and the dyadic scale j . 
Theain (7) is the Lipschitz exponent which can describe regularity of the object edge. 
The k and aare two other parameters related to the object edge. 



4 Experiment 

We use the 256 grayscale industrial camera in the experiment. The focal length of the 
camera is adjusted to minimum (30cm). In order to decrease the effective scene depth 
of the camera the aperture is set to maximum so that to enhance the sensitivity to 
distance changing of interested object. Five images of capital character D were taken 
in our experiment. Figure3 (a) shows that the image of the character Doowhich is at 
the focal plane of the camera (30cm). Figure3 (b) to (e) each increase the distance of 
5cm form the former respectively. 




(a) (b) (c) (d) (e) 

Fig. 3. Images with different distance 

We use the quadratic spline as wavelet. In the experiment we perform the wavelet 
transform in different dyadic scale 2 J (j=l,2,3) and record the modulus and gradient 
angle computed by formula (3) and (4). Then we threshold these modulus and search 
the modulus maximal along the direction of grayscale gradient. We obtain the 
modulus maximal link by linking these pixels. The average of the pixels modulus of 
the linking line are denoted by Mi M 2 M 3 respectively. 

Table 1 shows the data in experiment. The doodenote distance, i$t M9 tyfo 
denote average value of the linking line, aodenote singularity of edge, kocandoocare 
two other parameters . 



Table 1. The data in experiment 



sample 


d (cm) 


Mi 


m 2 


m 3 


a 


k 


a 


#1 


30 


103 


119 


135 


0.24 


83.8 


0.00 


#2 


35 


93 


120 


139 


0.31 


74.95 


0.00 


#3 


40 


63 


96 


128 


0.50 


46.34 


0.72 


#4 


45 


47 


72 


115 


0.64 


30.50 


0.77 


#5 


50 


34 


55 


94 


0.65 


24.22 


12.83 





222 Wang Qiang et al. 



We can draw a curve according to the distance d and the responding Lipschitz 
exponent a shown in figure 4. 




Distance between object and lens (cm) 

Fig. 4. The curve of relation betweenaand distance 

If we take image of objects with unknown distance we can measure the distance and 
the depth from figure 4 by computing the Lipschitz exponent a(in certain range). 

5 Conclusion 

The validity of the proposed method is proved by the results of the experiment. This 
new approach has the following advantages: 

With the new approch above we can get the depth information of different objects 
in a scene. The equipment is simple(only a set of the grayscale camera system) and 
the measuring is easy to realize. 

The method can be used in image segmentation. We can use the mehod above to 
distinguish the blur extent of the edges so that to realize the image segmentation. 
Different kind edges and contours can be chosen and segmented according to the prior 
knowledge about edge of the objects. 

References 

1. Zheng Nanning, Computer Vision and Pattern Recognition, Defence and Industry 
Press, China (1998) PP169-191 

2. Zhang Huobao, Multiresolution Edge Extraction Based on Orthowavelet, Chinese 
Transaction on Image and Graphics, Vol.3, No. 8, 1998, PP651-654 

3. Canny J, A Computational Approach to Edge Detection, IEEE T-PAMI, Vol.8, 
No. 6, 1986, PP679-698. 

4. Sridhar R. Kundur, Novel Active Vision-Based Visual Threat Cue for 
Autonomous Navigation Tasks, Computer Vision and Image Understanding, 
Vol.73, No.2, 1999, PP169-182. 

5. Mallet S , Zhang S, Characterization of Signals from Multiscale Edge, IEEE T- 
PAMI, Vol.14, No7, 1992, PP7 10-732 

6. Mallet S, Hwang W, Singularity Detection and Processing with Wavelets, IEEE 
Transaction on Information Theory, Vol.38, No.2, 1992, PP617-643. 

7. A. N. Rajagopalan and S.Chaudhuri, An MRF Model-based Approach to 
Simultaneous Recovery of Depth and Restoration from Defocused Images, IEEE 
T-PAMI, Vol.21, No. 7, 1999, PP578-589. 



Construction of Finite Non-separable 
Orthogonal Filter Banks with Linear Phase and 
Its Application in Image Segmentation 



Hanlin Chen 1 and Silong Peng 2 

1 Inst, of Math., Academia Sinica 
100080, Beijing, PRC 
chen@math.03 . math .ac.cn 
2 NADEC, Inst, of Automation, Academia Sinica 
100080, Beijing, PRC 
pengsl@nadec .ia.ac.cn 



Abstract. In [7], a large class of bi- variate finite orthogonal wavelet 
filters was constructed. In this paper, we propose a more general expres- 
sion of the filter bank with linear phase which is called standard method. 
Beside this, a non-standard method is also presented. A interesting ex- 
ample is also given. By using this non-separable wavelet filter bank, we 
present a novel method of segmenting a image into two parts: one part 
is texture with special property and another part is image of piecewise 
smooth in some sense. 



1 Introduction 

Recently, many researchers are working on non-separable wavelets (see 
[1,2, 3, 4, 7, 8 ] and the references therein). In [7], a large class of bivariate com- 
pactly supported orthogonal symmetric wavelet filters (low-pass and high-pass) 
with arbitrary length are presented in explicit expression. 

In this paper, we give another two methods of constructing bivariate com- 
pactly supported orthogonal symmetric wavelet filters. The standard method 
in this paper is similar to that of [7], but it’s result is more general. We prove 
that non-standard method is included in the standard method. The standard 
method is introduced in next section. The non-standard method will be given in 
section 3. A simple image segmentation method by using the filters is given in 
section 4. 

2 Standard Method 

Let {Vj} be a two dimensional MRA, then there exists a function mo(£, 77)(£, rj £ 
R) such that <£(2£, 2rj) = mo(£, rj)<p(£, 77 ), where (p is the Fourier transform of ip, 
and mo is called Symbol Function of the scaling function p. The orthogonality 
of {p(x — j,y — k)}j,k 2 Z implies that mo satisfies 

\m 0 (C,v)\ 2 + \m 0 (,£ + n,ri)\ 2 + |m 0 (f, r) + tt )| 2 + |m 0 (f + tt, 77 + tt )| 2 = 1 . ( 1 ) 

Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 223 229, 2001. 
c Springer- Verlag Berlin Heidelberg 2001 



224 Hanlin Chen and Silong Peng 



If a trigonometric polynomial mo (£, 77 ) satisfies (2.1) and mo (0,0) = 1, we 
call mo a Orthogonal Lowpass Wavelet Filter. Assume that m(x,y) is a 
polynomial of x and y with real coefficients. Rewrite m(x,y) into its polyphase 
form as 



m(x,y) = f 1 {x 2 ,y 2 )+xf 2 (x 2 ,y 2 ) + yf^{x 2 ,y 2 ) + xyf 4 {x 2 ,y 2 ). (2) 

It is easy to see that m(e l ^,e iri ) satisfies (1) is equivalent to 

^|/,(e l «,e^)| 2 = i £,yeR. (3) 

v=l 

In this paper, all polynomials are with real coefficients in default. 

In some applications, it is better to use a filter with linear phase than a filter 
with nonlinear phase. It is well known that in one dimension case, there does 
not exist a orthogonal filter with linear phase beside Haar filter. But in high 
dimension case, we can find many filters with linear phase. 

Definition 1. Given a polynomial m(x,y), if 



m(e e ir i) = e ® iM ^ m(e* e iri ), (4) 

where Mi and M 2 are positive integers, then we say that rn(e 1 ^, e lv ) has Linear 
Phase. 

To our purpose, we introduce a kind of matrix transform. For a matrix A 
of size m x m, define A s := H rn AH m , where H m = (hki)™ i=1 is matrix of size 
m x m, with hki = 1 when k + 1 = m + 1 and 0 otherwise. Moreover, denote U 2 
to be the set of all real unitary matrices with size 4x4. The following theorem 
give a large class of symmetric wavelet filters. 

Theorem 1. Let 



m(x, y) = ±X-f[(UMx 2 ,y 2 )UZ)-V 0 , (5) 

M=1 

where U ^ £ J 2 := {U\U £ ZY 2 , U = and D^(x,y) = diag{ 1, x, 1, x}, or 

diag{l , y , 1, y }, for y = 1, • • • , 7V ; and Vo = (1 1 1 1) T , X = (1 x y xy), then 
m(e^,e 27? ) is a symmetric filter and satisfies (1). 

Proof. The proof is direct. 

Remark 1. Although the form of (5) is similar to that of [7], but we can see that 
this form is more general. In fact, the filters given by non-standard method later 
are included in this form, but not included in the expression of [7]. 



Construction of Finite Non-separable Orthogonal Filter Banks 225 



3 Non-standard Method 

In this section, we will introduce a new method to construct finite orthogonal 
symmetric wavelet filters. Given a polynomial with real coefficients m(x,y), and 
let 

m(x,y } = fi(x 2 ,y 2 ) + xf 2 {x 2 ,y 2 ) + yf 3 (x 2 ,y 2 ) + xyf 4 (x 2 , y 2 ). (6) 

If 

m(e«, e ir i) = e ®i{2M 2 +i) v ^ ^ ( 7 ) 

that is m{x,y) = ±x 2Ml+1 y 2M2+1 m( i, i), then we obtain 

fi(x,y) = ±x Ml y M2 f 4 (~, 1), h{^,y) = ±x Ml y M2 f 3 (~, 1). (8) 

x y x y 

If m(e^,e 27? ) satisfies ( 1 ), then /i, / 2 , fs and satisfy (3). Substitute ( 8 ) into 
(3) to obtain 

|/i(e^,e^)| 2 -h |/ 2 (e^,e^)| 2 = (9) 

Conversely, if we have two polynomials /i and / 2 satisfy (9), then ( 8 ) will 
give fs and such that we can get a finite orthogonal symmetric wavelet filter. 
The following theorem give a large class of the solutions of (9). 

Theorem 2. Let 

1 N 

(h(x,y) f 2 {x,y)) T = - ■ (i 1) T , (10) 

11=1 

where A M is any real unitary matrix of size 2x2 , E /JL = diag( 1, x) or diag( 1, y), 
for f± = 1, • • - , N, then f\ and / 2 satisfy (17). 

Proof. Since A^ s are unitary matrices, and s are paraunitary matrices, the 
conclusion is followed immediately. 

The non-standard method looks like different from the standard method, 
but in fact, all filters result from non-standard method can be constructed by 
standard method, which is the following theorem. 

Theorem 3. Let 

m(x, y) = fi(x 2 ,y 2 ) + xf 2 {x 2 ,y 2 ) + yf 3 {x 2 ,y 2 ) + xyf 4 (x 2 , y 2 ), 
where fi(x,y) and f 2 (x,y) satisfy (10), m(x,y) satisfies (7), then we have 

1 N 

m(x, y) = -X • Y[(U^D^(x 2 ,y 2 )UZ) ■ V 0 , (11) 

fl=l 



U,= 




Dp{x,y) 



( E^(x,y) 0\ 

V 0 E^(x,y) ) ' 



where 




226 Hanlin Chen and Silong Peng 



By using this construction method, we can construct the following non- 
separable filter banks. The lowpass filter is: 





/ 


1 


-1 1 


1 


\ 


1 




1 


1 1 


-1 




8 




-1 


1 -1 


1 








1 


1 -1 


1 


J 



The three highpass filters are 





( 1 -1 1 1 \ 




(-11 1 1 \ 




/ 1 -1-1 -1\ 


1 


111-1 


i 


-1 -1 1 -1 


1 


11-11 


8 


1 -1 -1 -1 


’ 8 


1-111 


’ 8 


1-111 




V - i-i i -i J 


W -1-1 1 ) 




l-l - 1-1 1 / 



4 Image Segmentation with Non-separable Symmetric 
Filter 



The filters given in previous section is very good in some sense: their element are 1 
or —1 (if we omit the factor |) which will be useful in computation; they are all 
with linear phase, two of them are symmetric, the other two are anti-symmetric. 
These filters have bad regularity in contrast with the well known biorthogonal 
9/7 wavelets. These filters act as derivative operators such as Sobel operator. 
The following examples illustrate this fact. 




Fig. 1 . Original image 



Image segmentation is important in many applications such as image com- 
pression and computer vision. In some applications, it will be useful to segment 
an image into two parts: one part is include regions with dense edges, and the 
other regions are with few edges. In general case, a sub-area of a image with 
dense edges maybe texture, which will be difficult to processed in applications 
such as compression. By using the derivative property of the filters, we present 
a novel method to do segmentation. 

In the area which may be texture, the distance between edges are short. In 
addition, there are always all direction of edges, this means that for each channel 




Construction of Finite Non-separable Orthogonal Filter Banks 227 




Fig. 2. Filtering result of the first high-pass filter. Left: bigger than 16. Right: 
smaller than -16 




Fig. 3. Filtering result of the second high-pass filter. Left: bigger than 16. Right: 
smaller than -16 



of the above filter, various edges will appear in the texture region. On the other 
hand, we only have one kind of edge near the edge of piecewise smooth area. 
Therefore we utilize this fact to segment a image. 

Here we have three high-pass filters, we call them three channels. In this 
algorithm, we do not do the down-sampling, just convolute the image with the 
filters. 

Suppose B is the currently processed channel. Let BP 1 = B > thl and 
BP 2 = B > th2, where thl > th2 are two positive numbers. In BP 2, we can 
find all the areas which contain at least one point of BP 1, all these areas put 
together to obtain BP. Similarly we can get the BM in which every point is 
negative number in B. 

Let DBF is a matrix with same size of H, its elements is 



DBP(p ) = 



0 , 

d(p,BM ), 



BP(p) = 0 
otherwise 



where p = (i,j) is a point, d(p, BM) is the distance of p and the nonzero point 
in BM . Similarly, we can define DBM. 

If a point in B is located in the texture area, then at least one of the corre- 
sponding value in DBP and DBM are small. Let SB is a matrix which indicates 
that at each point, both values in DBP and DBM are smaller than a given 
threshold th. Of course, the selection of th depend on the scale of the texture 
one prefer, and the segment result depends on the selection of the threshold. 

Let 51, 52 and 53 be the corresponding segment results of three channels 
respectively. Then we can segment roughly as 5 = 51 + 52 + 53 > 1, which 







228 Hanlin Chen and Silong Peng 




Fig. 4. Filtering result of the third high-pass filter. Left: bigger than 16. Right: 
smaller than -16 

means that, if a point is located in the texture area, then it will appear in at 
least two of the channels. 

At last to obtain a true area, we need to do some small dilation and erosion. 
A segmentation example is given as follows by using the above filters and the 
well known image. 



References 

1. I. Daubechies, Ten Lectures on Wavelets, CBMS, 61, SIAM, Philadelphia, 1992. 



2. Wenjie He and Mingjun Lai, Construction of Bivariate Compactly supported 
Biorthogonal Box Spline Wavelets with Arbitrarily High Regularities, Applied 
Comput. Harmonic Analysis, 6(1999) 53-74. 223 

3. Wenjie He and Mingjun Lai, Examples of Bivariate Nonseparable Compactly Sup- 
ported Orthonormal Continuous Wavelets , Wavelet Applications in Signal and 
Image Processing IV, Proceedings of SPIE, 3169(1997) 303-314. 223 

4. J. Kovacevic and M. Vetterli, Nonseparable multidimensional perfect reconstruc- 
tion filter banks and wavelet bases for R n , IEEE Tran, on Information Theory, 38, 
2(1992) 533-555. 223 

5. S. Mallat, Review of Multifrequency Channel Decomposition of Images and 
Wavelet Models, Technical report 412, Robotics Report 178, New York Univ., 
(1988). 




Fig. 5. Original Image 



Fig. 6. Segmentation result ( th = 20) 



223 







Construction of Finite Non-separable Orthogonal Filter Banks 229 



6. Y. Meyer, Principe d’incertitude, Bases hilbertiennes et algebres d’oper-ateurs, 
Seminaire Bourbaki 662,1985-86, Asterisque (Societe Mathematique de France). 

7. Silong Peng, Construction of Two Dimensional Compactly Supported Orthogonal 
Wavelet Filters with Linear Phase, (to appear in ACTA Mathematica Sinica), 
(1999). 223, 224 

8. Silong Peng, Characterization of Separable Bivariate Orthonormal Compactly Sup- 
ported Wavelet Basis , (to appear in ACTA Mathematica Sinica), (1999). 223 

9. Silong Peng, N dimensional Compactly Supported Orthogonal Wavelet Filters , (to 
appear in J. of Computational Mathematics), (1999). 



Mixture-State Document Segmentation 
Using Wavelet-Domain Hidden Markov Tree Models 



Yuan Y. Tang Yuhua Hou 2 , Jinping Song 2 , and Xiaoyi Yang 2 

’Department of Computer Science, Hong Kong Baptist University 
Hong Kong 

yytang@comp . hkbu . edu . hk 
department of Mathematics, Henan University 
Kaifeng, 475001, China 
houyuhua@mail . henu . edu . cn 



Abstract. In this paper we introduce a mixture-state document 
segmentation method based on wavelet and the hidden Markov tree 
(HMT) models. First we propose a three-state HMT segmentation 
method that is similar to those in the reference [1]. Then through 
comparing the difference weights to the three-density Gaussian mixture 
distribution of different textures, we find that background, text and 
image can be well approximated respectively by one-state and two-state 
and three-state HMT models. Then we get a new segmentation method, 
mixture-state HMT segmentation. Experiments with scanned document 
images indicate that the new approach improves the segmentation 
accuracy over the raw segmentation in [1]. 



1 Three-State HMT Segmentation 



1.1 Two-State HMT Segmentation 

The work on document segmentation by wavelet-domain hidden Markov tree (HMT) 
methods have been considered in many papers, such as [1,2], in which the documents 
are divided into classification blocks, and decisions are made independently for the 
class of each block. For example, Hyeokho Choi and Richard Baraniuk [1] proposed a 
multiscale document segmentation algorithm which divided document into dyadic 
squares at different scales, the dyadic squares at some scale are obtained simply by 
recursively dividing the document into four square subdocuments of equal size. In 
that way, every parent square has four children squares and then all dyadic squares 
have a convenient quad-tree structure. By using the simplest 2-D Haar wavelet 

transform, the wavelet coefficient matrices w ,w ,w at different scales lead 
naturally to quad-tree structure on the wavelet coefficient in each subband and each 
wavelet coefficient node corresponds to a related dyadic image square. 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 230-236, 2001. 
Springer- Verlag Berlin Heidelberg 2001 




Mixture-State Document Segmentation 23 1 



Just as the notations in [1], we denote a dyadic square at scale j by dj (with i an 
abstract index enumerating the squares at scale j ). The two extremes d q and d '- 

are root and leaves of the tree respectively. Let w. denote a generic wavelet 
coefficient of some subband tree at a certain scale (sometimes we denote i for 
simple). Define (/) tobe the parent of node i . Given a subband, define T t to be 
the subtree of wavelet coefficients with root node i . 

In practice the probability density function pdf f (w z ) is unknown. The paper [1] 

associates a discrete hidden state S f that takes two values and connects the hidden 
states in a directed Markov- 1 probabilistic graph. Given a condition: 
S t r (r S,L), the probability density function pdf f(wf) of each wavelet 

coefficient is approximated by a two-density Gaussian mixture model 

/0 ; ) ! PsM)fi w i\ S i r ) (i) 

r S,L 

where f(w, \ S, r) ~ N(V. r ,! , and p Si (S)# p Si (L) 1. 

Denote 3 LH , 3 HL , 3 HH the parameter sets for LH , HL and HH subbands, and 
M : °/9 lh ,3 hl ,3 hh &. Define 9 t (r) : f(T t \ S t r,3 ) , and denote 



f(T t |3) ! Mr)p(S, r\3) (2) 

r S,L 

The overall likelihood of some dyadic squares can be computed by 

fid, | M) f(T t LH | 3 LH )f(T i HL | 3 “). f(T t HH \3 hh ) (3) 

The raw HMT segmentation method [1] is then given by 

c - ,L : argmax f(d i \ M c ) (4) 

c( %,2,...,N C & V } 

where N c is the number of class labels, and M c indicates the corresponding 
parameter set M for some class label c . 



1.2 Three-State HMT Segmentation 

In this paper, we associate the discrete hidden state S t that takes on three values 
r S,M,L with probability mass function pmf p s (r) . Condition S i r , W f is 

three-density Gaussian mixture model with mean V ir and variance / f r . Thus, the 
overall pdf is given by 




232 Yuan Y. Tang et al. 



/O,) ! Ps, OO/K | s, r ) 

r S,M,L 



(5) 



where f(w, \ S, r) ~ N(V i r ,f , p s (S)# p s (M)# p s XL) 1. For 
each parent-child pair of hidden states '/S' ( ; , , S t the state transition probability 



matrix becomes 

3 * 0),s 
% US 

°/<V ou 



& W 



* ov 

i,M 

* <0. M 

i,M 

* W’ L 

i,M 






(i),S 



i,L 

*. M 

i,L 

V # 



V 



(/),S 



i,M 



*. 



( i),S 



i,L 



(i),M 



i,S 



1) *. 



/o 

& 



( 0 + 



i,S 



* ®> S 
i,M 

(i),M x (/),M 
,5 / i,L 

* (OY 

i,M 



* OV 

*i,L 

y (i)M 



( 6 ) 



1 ) 



i,L 

(i),L 



i,S 



* (0+ 

% a 



Accordingly, the formula (2) turns into 

/W 1 3) ! J,(r)p(5, r|3) (7) 

r 

Similar to [1], we can obtain a three-state HMT segmentation method through Eqs. 
(3)-(7). 

Experiments of the scanned document images indicate that the new approach 
improves somewhat the accuracy of the segmentation over the raw segmentation in 
paper [1], as shown in Fig. E 

For all results of the segmentation, we use white , gray , and black to 
represent background, text and image regions respectively. 




Fig. lab. (a) 512 + 512 document image used for training of the HMT models for text, 
image, and background textures, (b) Original 512 + 512 document image to be segmented 




Mixture-State Document Segmentation 233 




Fig. lc. Segmentation result by two-state HMT segmentation 





Fig. Id. Segmentation result by using the proposed three-state HMT segmentation algorithm 



2 Mixture-State HMT Segmentation 

During the course of training three-state HMT models by a lot of experiments, we 
find that the weights, which is the values of pmf p s (r) , to the three-density 

Gaussian mixture distribution of different textures are stable relatively. This can 
found in Figs. 2 and 3. 




background text image 



Fig. 2. The different weights p s (r) to the three-density Gaussian mixture distribution of 
background, text, image in HH subband at J scale, the finest scale 







234 Yuan Y. Tang et al. 




background 



text 



image 



Fig. 3. The different weights p s (r) to the three-density Gaussian mixture distribution of 
background, text, image in HH subband at scale of J ) 1 





At the finest scale, and in HH subband, Fig. 2 shows that the three weights 

0.8907, 0.0502,0.0591 

for background, the three weights 

0.0889, 0.4175, 0.4936 

for text, the three weights 

0.1627, 0.3032,0.5341 

for image. 

Similar results could be obtained in other subbands and scales. This reminds us that 
background, text and image perhaps can be well approximated respectively by one- 
density, two-density and three-density Gaussian mixture models. 

With this idea, we propose a new HMT document segmentation algorithm, the 
mixture-state HMT segmentation, that approximates background with one-density 
Gaussian model, text with two-density Gaussian model and image with a multi- 
density Gaussian model (specially with a three-density Gaussian model in this paper). 
In fact, we can regard multi-density Gaussian mixture models as a set: 

~! P s , (0/0, | S t r)j , 

for some p s (r) 0 . We can obtain one-density, two-density, three-density 

Gaussian mixture models and so on. To obtain the segmentation from an original 
picture, first we train HMT models respectively for background with one-state, text 
with two-state and image with three-state to achieve parameter sets M. Then the 

likelihood of the coefficients in subtree T t can be computed by 

m\l ) ! },{r)p(S, r 1 3 ) (8) 

r 

where r stands for the state. Finally, we can obtain the result of the segmentation by 
using formulas (3) and (4). 




Mixture-State Document Segmentation 235 



Experiments with scanned document images, as shown Fig. 4, indicate that the new 
approach, the mixture-state HMT segmentation, improves the accuracy of the 
segmentation much batter than the raw segmentation in paper [1] and the three-state 
HMT segmentation proposed previously by Choi and R. G. Baraniuk. 

Furthermore, since the new method regard background, text and image as the 
different density Gaussian mixture distributions, the training and testing processes 
will become simpler than the multi-state HMT segmentation method. And our 
algorithm can offer improved segmentation accuracy with lower computational 
burden compared with the raw segmentation in HI. 




Fig. 4. (a) 512 + 512 document image used for training of the HMT models for background, 
text, and image textures, (b) Original 512 + 512 document image to be segmented, 
(c) Segmentation in 2 + 2 block size by using two-state HMT segmentation, three-state HMT 
segmentation and mixture-state HMT segmentation respectively, (d) Segmentation in 4 + 4 
block size, (e) Segmentation in 8 + 8 block size 










236 Yuan Y. Tang et al. 



References 

1. H. Choi and R. G. Baraniuk, Multiscale Document Segmentation using Wavelet- 
Domain Hidden Markov Models, Science & Technology , Janu. 2000. 

2. M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, Wavelet-Based Statistical 
Signal Processing using Hidden Markov Models, IEEE Trans. Signal Proc. 46, 
April 1998. 




Some Experiment Results on Feature Analyses of Stroke 
Sequence Free Matching Algorithms for On-Line 
Chinese Character Recognition 



Tak Ming Law 

Hong Kong Institute of Vocational Education (Morrison Hill) 
Department of Computing, 6 Oi Kwan Road, Wan Chai, Hong Kong 
tmlaw@vtc . edu . hk 



Abstract. We have built several trial programs (system) to test the 
assumptions we have made on improving the speed and accuracy of on- 
line Chinese character recognition results. This paper describes an on- 
line Chinese character database, which is built upon the analysis of the 
character segments. Therefore, the structure of the database covers only 
on the features of individual segment and the relations used to 
distinguish one character from another. On the other hand, the stroke 
sequence free algorithm checks the stroke segments iterately until all 
segments are evaluated between the input and reference characters. 
Therefore, even the users input the characters against the stroke order 
rules, the system still be able to get the correct result. Since the distance 
measure algorithm depends only on the features of segments, the system 
need not to distinguish radical and structure within characters. 



1 Introduction 

In order to improve the handiness and speed of On-line Chinese character recognition, 
a segment based Chinese character database (was presented in [1,2]) is applied to 
achieve the stroke sequence free feature. Our approach tends to put all the important 
features of each segment of characters into the database, which will shorten the time of 
character retrieval in the matching stage and solve the problem of incorrect stroke 
sequence inputting automatically. In other words, a stroke sequence free algorithm 
checks the segment iterately until all segments are matched. Therefore, the users do 
not need to input the characters according to stroke order. The accuracy and speed of 
recognition is increased due to the completeness of features in the dictionary database, 
and the efficiency of the stroke free matching algorithm. The recognition stage is to 
filter out the inappropriate candidates as much as possible. The preliminary match 
stage further reduces the number of candidates for the final detailed match in the 
recognition stage. 

This paper is truncated into a very brief summary and becomes insufficiently self- 
contented. The remainder of this paper will be organized as follows: in the next 

Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 237-241, 2001. 

Springer- Verlag Berlin Heidelberg 2001 




238 Tak Ming Law 



section, the concept of segment of character will be briefly introduced; in section 3, 
the performance of stroke sequence free matching algorithm is described; and finally 
some experimental results and discussion will be concluded in the section 4. 



2 Segments 

In our database (which was presented in [1]), for simplicity and performance, we only 
consider five types of stroke, which is the basic stroke. Basic stroke can be divided as 
five strokes presented in the following table (which was extracted from [3]). 



Table 1. Five basic types of segment 



Stroke Name 


Horizontals 

jffi 


Verticals 


South-West 
Slanting IIS 


South-East 

Slanting 

s 


Dot 

K 


Symbol 


h 


V 


P 


n 


d 


Stroke Shape 




1 









The above five kinds of basic strokes are in their own unique directions without 
turning points. In our system, we call the above basic stroke types as segment types 
and which as the basic elements of the whole Chinese character database. Our system 
breaks all the compound-segment strokes into segments that used as the elements to 
represent the entire character. The features of each segment have been analyzed and 
placed into a single vector. 

Segments compose a compound-segment stroke. The amount of stroke count is 
different from segment count for a particular character. For example, The character 
(M) is counted as 12 strokes in the regular Chinese database but 16 segments in our 
system since the character contains 4 compound-segment strokes ) each consisting 
of 8 segments. Our system breaks all the compound-segment strokes into segments 
that are used as the elements to represent the entire character. The features of each 
segment have been analyzed and placed into a single vector. 



3 Stroke Sequence Free Matching Algorithm 

Once the segment features of the input character have been obtained, the stroke 
sequence free matching algorithm can be started. The solutions were inspired from 
[4]. The steps are as follows: (1) each one of the segments in the input character will 
be matched iterately with all the segments in the reference character. (2) each 
iteration will get one best match pair of the input and reference characters. Best match 
pair denotes any pair of two segments with the lowest distance (Fig.l). (3) The 
segments in the input character will be iterately matched with the reference character 
one by one until all the segments are processed and the best match pairs will have 
been set up (Fig.2). 



Some Experiment Results on Feature Analyses 239 






Sometimes there may be two or more segments on one side matched with only one 
segment on the other side (Fig.3). This occasion shows there must be an incorrect 
matching between those matched pairs. The degree of distance between two segments 
is called match strength. The higher the distance ratio, the lower the match strength. 
When the above occasion occurs, the system will iterate again and distinguish the best 
match pairs according to the match strength between segments. Finally, rearrange all 
the best match pairs within the character as shown in Fig.4. 



4 Experiment Results and Analysis 

We had built and tested the segment database by using stroke sequence free matching 
algorithm. We perform the experiment by combining the techniques mentioned in [5], 
[6], [7] and [8] as a whole system. Some ideas of the testing were extracted from [9]. 

An over all recognition rates of 98.2% were achieved and the average speed of 
recognition were less than 1/2 second per character on IBM Compatible PCs. It is a 
closed result tested by the author himself, with limited cursive writing, in his own 
laboratory. Although the result is writer dependent, it still shows that an integrated 
recognition system using the proposed database is very promising. To perform the 
practical experiment, a database composed of 1100 Chinese characters was 
constructed as the dictionary, which had been trained for five times during the data 
learning stage. The segment numbers of the characters range from 1 to 31. Figure 5 
indicates that the characters with a large segment number have higher recognition 
accuracy. All characters in the database had been tested one by one with only one try 
for each character. 








240 Tak Ming Law 




Fig. 5. The accuracy flow between different segment numbers 



4.1 Conclusion 

Compared with the other methods, the proposed method has the following advantages: 

1 . Save time and computing resources during the recognition stage. 

Since all the features are in the Chinese character dictionary; the system only 
performs matching algorithms instead of performing both the calculation and the 
matching process for all candidates during the recognition stage. 

2. Capable of recognizing the Chinese character without radicals. 

The recognition stage does not count on the radical detection. The system only 
considers the features of each individual segment within the Chinese character 
dictionary. The system is able to recognize all the characters in the dictionary 
including those without radicals. 

3. Flexible adaptation 

The system can adapt to the variations related to the stroke relationship. 

4. Cursive writing handling 

The proposed method can match cursive Chinese character and can separate 
similar Chinese characters. 

5. Free from stroke-order and stroke-number variations. 

There is an algorithm mentioned in section 3 that disregards the stroke-order by 
selecting the high similarity stroke from the template as the result of the specific 
matching stroke of the input character instead of matching each stroke one by one 
in sequence between the template and the input character. Besides, the system 
permits a certain degree of tolerance for stroke-number variations. In this 
experiment, we set the tolerance for stroke-number variations as |2|. 

6. Training for the user is not required. 

The stroke free matching algorithm is powerful enough to isolate a small number 
of probable candidates for the final recognition stage, therefore, the time 





Some Experiment Results on Feature Analyses 241 



consumed for each character is within tolerable limits, i.e. 1/2 second. The main 
factor of the mis-recognition is due to the confusion of similar characters. The 
segment relations of these confusion pairs are not significant enough to 
discriminate one from another. Although it still remains a lot of problems to be 
solved, the current results are encouraging and inspire us to put further effort to 
discover more solutions. Finding out different features to recognize the similar 
characters will be the direction of our future. 



Reference 

1. Tak Ming Law, Signal Learning Algorithms and Database Architecture for On-line 
Chinese Characters Recognition, Proceedings of 2000 International Workshop on 
Multimedia Data Storage, Retrieval, Integration and Applications, Hong Kong 
(2000) 68-74. 

2 Tak Ming Law, An On-line Chinese Character Recognition, Master of Philosophy 
thesis, The Chinese University of Hong Kong, Computer Science and Engineering 
Department, (1996). 

3. Chi Chung Zhang, Chinese Recognition Techniques, Chinese Signal Processing, 
Tsing Hwa University Press, (1992). 

4. Sheng-Li n Chou and Wen-Hsiang Tsai, On-Line Chinese Character Recognition 
through Stroke-Segment Matching using a New Discrete Iteration Scheme, 
Computer Processing of Chinese and Oriental Languages, Vol.7, No. 1, (1993) 1- 
20 . 

5. Tak-Ming Law, The Decision Path Classification For A Segment-Based On-Line 
Chinese Character Recognition, Proceedings Of The Conference On Applications 
Of Automation Science And Technology, Hong Kong (1998) 227-231. 

6. Tak-Ming Law, Signal Smoothing, Sampling, Interpolation And Stroke 
Segmentation Algorithm For On-Line Chinese Character Recognition, Proceedings 
Of The Second International Conference On Information, Communications & 
Signal Processing, Singapore (1999),. 

7. Tak-Ming Law, Signal Learning Algorithms And Database Architecture For On- 
Line Chinese Characters Recognition, Proceedings Of The 2000 International 
Workshop On Multimedia Data Storage, Retrieval, Integration And Applications, 
Hong Kong (2000) 68-74. 

8. Tak-Ming Law, Segmentation Analysis And Similarity Measure For Online 
Chinese Character Recognition, Proceedings Of The International Conference On 
Chinese Language Computing, Chicago, Illinois, USA (2000). 

9. Mr. Wong, An On-line Chinese Character Recognition, Master of Philosophy 
thesis, The Chinese University of Hong Kong, Information Engineering 
Department, (1993). 




Automatic Detection Algorithm of Connected Segments 
for On-line Chinese Character Recognition 



Tak Ming Law 

Hong Kong Institute of Vocational Education (Morrison Hill) 
Department of Computing, 6 Oi Kwan Road, Wan Chai, Hong Kong. 
Email: tmlaw@vtc.edu.hk 

Abstract. This paper presents a very easy way to detect the improper 
connected strokes by simply breaking all the strokes into pieces of segments. 
Once the strokes of the character decomposed into segments, as the basis of 
recognition, the connected stroke problem is no longer exists anymore. 



1 Introduction 

One of the most popular multimedia devices for people to enter Chinese characters 
into the system is on-line Chinese character recognition system. 

There are so many ways to perform on-line Chinese character recognition. 
For examples, some researchers utilize individual classifiers [1] to derive the best 
final decision from the statistical point of view [2] and others classify characters by 
feature extraction [3] or structural [4]. The simplest method used for the 
recognition is template matching [5]. Some works emphasized on characters 
searching look up [6]. Relaxation is a well-known matching method, which has 
been employed for the recognition of Chinese character [7]. Some other methods 
like attributed string matching by split- and-merge and segment-order free techniques 
[8] are also applied in on-line characters and numeric digit recognition. Now, some 
researchers are developing on-line Chinese signature verification by using some 
advanced character recognition techniques [9]. However, some products in the 
market do generate some incorrect results. The problems may be due to the 
inefficiency of database structure and retrieval methods. 

On-line Chinese character recognition algorithms are usually based on 
comparing the similarities of the individual stroke segments between the input and 
reference characters. However, the accuracy of the measurement always hindered 
by the improper connected strokes caused by running handwriting on the electronic 
tablet. We have found a very easy way to detect the improper connected strokes by 
simply breaking all the strokes into pieces of segments. Once the strokes of the 
character decomposed into segments, as the basis of recognition, the connected 
stroke problem is no longer exists anymore. Lets start by looking at the foundation 
that our recognition system based on in section 2. 



Y. Y. Tang et al. (Eds.): WAA 2001, LNCS 2251, pp. 242-247, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




Automatic Detection Algorithm of Connected Segments 243 



2 Basic Stroke Types (Segment Type) 

In our database, for simplicity and performance, we only consider five types of 
stroke, which is the basic stroke. Basic stroke can be divided as five strokes 
presented in the following table [10]. 

Table 1 . Five basic types of segment 



Stroke 

Name 


Horizontals 

If 


Verticals 

m 


South- 

West 

Slanting 

m 


South-East 

Slanting 

& 


Dot 


Symbol 


h 


V 


p 


n 


d 


Stroke 

Shape 




1 


z 


N 





The above five kinds of basic strokes are in their own unique directions without 
turning points. In our system, we call the above basic stroke types as segment types 
and which as the basic elements of the whole Chinese character database. Our 
system breaks all the compound- segment strokes into segments that used as the 
elements to represent the entire character. The features of each segment have been 
analyzed and placed into a single vector. 

Segments compose a compound- segment stroke. The amount of stroke count 
is different from segment count for a particular character. For example, The 
character ) is counted as 12 strokes in the regular Chinese database but 16 
segments in our system since the character contains 4 compound- segment strokes 
(^ ) each consisting of 8 segments. Our system breaks all the compound-segment 
strokes into segments that are used as the elements to represent the entire character. 
The features of each segment have been analyzed and placed into a single vector. 



3 Connected Segments Handling 

A stroke segment is measured from pen down to pen up. A connected segment is a 
segment with freeman code 1, 2, 3, 4, and is located between standard segments. In 
this system, we can easily detect all segments with the freeman code 1, 2, 3 and 4 
from the input characters. The system counts them as the end of the segments; 
otherwise, they will be counted as connected segments. Fig. 1 shows an example of 
eliminating hooks. 



